Data Extraction from PDFs

Hi

I have bunch of pdf and want to extract a table from the pdfs. But data scrapping not working.
when i am taking the data in text file headers are coming 3 times as there are 3 pages with headers in the pdf. How i can remove these headers?
i have to create separate excel file for every pdf.
Any approach that u can suggest.

Thanks in advance

What’s the issue when you say that data scraping isn’t working?

That means your datatable(in which you store extracted data) has the headers in it as rows because every page starts with that?

You can go ahead and create the datatable entirely by extracting data then you can use look-up datatable for that and add the value of the header of the datatable and you’ll get a row index returned, then simply remove that row using

You can have a logic like, run this lookup datatable sequence for a counter that is equal to the number of pages in PDF.

Hope this helps!

Hey,
You can always try use the UiPath.Python.Activities
image
and use the python methods to get table from pdf files and convert to csv.
try this:

you can also try this:

or here is great tutorial how to use document understaning:

2 Likes