Convert PDF to Excel

Hi all

I am working on a secured PDF which are all diagrams, and need to convert it to excel.
Due to it is secured, the only thing i can use is convert pdf to txt, and then capture content I need in the text to excel form.

There is a diagram I cannot convert it,test.pdf (27.4 KB)

this pdf is an example of the diagram due to security issues. I write a word and generate this pdf.

I want it looks like normal format in the excel.
However, it looks like this in the notepad

does any one have any idea how to do this?

Thanks so much!

@jingwang0222

Open the pdf with microsoft word first, and copy it to Excel. It might be work.

J,

1 Like

Hi,
You have a same problem as me. I used Tabula to solve it. try and let me know

Hi Jumbo

I convert it to word and try to copy paste the word content to excel, but it didnt work good,

here are the word & xaml, can you have a look? thanks!

test doc.zip (28.1 KB)

530.xaml (30.2 KB)

Hi
thanks for your advise, but my file is secured and cannot be uploaded.

cheers

@jingwang0222

I checked both and I got what you mean, there is many useless blank and line break exist in the word file.
However, I suppose this is owe to PDF format and those data is the same as PDF, right?

In my understanding, this is only way to read PDF file with it’s table format if you cannot use data scraping method in the PDF (except OCR).
So If the format is stable, I recommend you to extract PDF data with this way and delete each space what you don’t need…

Rgds,
J,

Thanks for your advice, I also tried that method, but it is too slow and didnt work well as I thought. So i came back to the method that capture data from txt. and i want to use regular expression to do that, which seems achievable for me, if you r still interest, i have asked a new question.

Thanks
Jing

1 Like

Hi,

what do you mean upload? I use tabula-java.

Hi Jumbo,

For a project that I have just been assigned, I have a similar issue ==> capture text from images of checks (which will be in “.pdf” format) and write an Excel table via UiPath.

Do you happen to know of a way that I can do the above?

Also, based on your suggestion here, how will UiPath come in to play? I am not seeing that. (I’m curious, that’s all.)

Any tips you can provide would be highly appreciated! :smile:

Thank you very much!

Very best,

@Dave_Chandra_US_Tax

Hi,

If you are using Adobe Acrobat DC, this custom package can help you: https://connect.uipath.com/community/project/pua-virtual-acrobat-dc-pdf-activities

You can first convert the scanned pdf into editable pdf (activity “Correct rotation& convert scanned PDF to editable text & images”. Then export the editable pdf to Excel (activity “Export PDF files to other format”).

1 Like

@Dave_Chandra_US_Tax

We have many extra package on package manager and this is one way.

See also,

Rgds,

Thanks, the problem have been solved, we are using abbyy finereader to convert pdf to excel. then capture data on that excel and generate our reports.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.