Extract table from PDF to Excel

Hi everyone

I’m trying to extract a table from pdf to excel. the format is like this

  1. I tried using pdf to excel activity not worked

2)And also I tried converting pdf to HTML is also not worked

I there any idea to extract the table,
I think the table is not an incorrect format
@Palaniyappan

1 Like

Hi @sudhasagar

Use read pdf text with ocr and use generated data table activity

Thanks
Ashwin S

1 Like

Fine
Did we try to copy the whole pdf and paste in a excel file
Kindly try that once manually and if that format looks fine in excel once after pasting we can do the same with uipath as well

Cheers @sudhasagar

I copied whole pdf data and paste in excel
The data comes entirely in one single column

Above @AshwinS2 has given idea, it’s working like whole data coming in a single column, and finding the key ends.

Then we can try with string manipulation
But is still a complex one
Or
We can try with any python script to read that pdf file
And get the output as Json file which we can later convert to excel

Cheers @sudhasagar

can u pls provide that python script??

1 Like

well!
as you can see it’s basically tricky
First you can split into newline and send them in For each loop and split based on spaces like if it as more than 2 spaces it as to split then you can add it to Collection using Add to collection and loop through splitted words and then you can add them to excel! like Item.ToArray using Add data Row

Cheers @sudhasagar

I have a script for my project buddy and I didn’t build that one
But the thing is the structure would differ for each process
If you are good at python or if we can get the help of someone who has python knowledge then this can surely done much more easier

Cheers @sudhasagar

Thanks for reply,
I’m taking help from python developer and he says that it’s not in structured table.
So, he can’t able to pull that table data from pdf.

Any suggestions from ur side??
How can i go forward?