Convert PDF to Excel

robot
activities
studio
uipath

#1

Hi all

I am working on a secured PDF which are all diagrams, and need to convert it to excel.
Due to it is secured, the only thing i can use is convert pdf to txt, and then capture content I need in the text to excel form.

There is a diagram I cannot convert it,test.pdf (27.4 KB)

this pdf is an example of the diagram due to security issues. I write a word and generate this pdf.

I want it looks like normal format in the excel.
However, it looks like this in the notepad

does any one have any idea how to do this?

Thanks so much!


#2

@jingwang0222

Open the pdf with microsoft word first, and copy it to Excel. It might be work.

J,


#3

Hi,
You have a same problem as me. I used Tabula to solve it. try and let me know


#4

Hi Jumbo

I convert it to word and try to copy paste the word content to excel, but it didnt work good,

here are the word & xaml, can you have a look? thanks!

test doc.zip (28.1 KB)

530.xaml (30.2 KB)


#5

Hi
thanks for your advise, but my file is secured and cannot be uploaded.

cheers


#6

@jingwang0222

I checked both and I got what you mean, there is many useless blank and line break exist in the word file.
However, I suppose this is owe to PDF format and those data is the same as PDF, right?

In my understanding, this is only way to read PDF file with it’s table format if you cannot use data scraping method in the PDF (except OCR).
So If the format is stable, I recommend you to extract PDF data with this way and delete each space what you don’t need…

Rgds,
J,


#7

Thanks for your advice, I also tried that method, but it is too slow and didnt work well as I thought. So i came back to the method that capture data from txt. and i want to use regular expression to do that, which seems achievable for me, if you r still interest, i have asked a new question.

Thanks
Jing


#8

Hi,

what do you mean upload? I use tabula-java.