Data Scraping from PDF with multiple pages and tables into excel

19028426d · February 7, 2022, 9:52am

Hi,

I am trying to extract data from a table spanning multiple pages of a pdf file. However, the problem is I need to extract the data with specific criteria (i.e >= specific amount). once the criteria is met, the amount should be linked with the company name.

Any pointers?

For example

Thanks in advance !!!

Krutika_Kotkar · February 7, 2022, 11:10am

scrape the entire table and use filter data table to filter rows based on your condition

19028426d · February 7, 2022, 6:07pm

how can I filter the column name “maximum funds approved…” into filter row as the error appeared with “the value for argument ‘column name’ is not set or is invalid”

gabrielribas4 · February 7, 2022, 8:58pm

Hey, @19028426d !! Can you share 1 PDF sample with us to find the best solution?
Only if you don’t have sensitive data.

But from what I understand from your question, a possible solution I would use would be to read the text with a READ PDF or OCR activity and use regex to extract the information of interest. If they satisfy the condition, I would keep the data.

I hope it helps!!!

gabrielribas4 · February 8, 2022, 3:53pm

Hey, @19028426d ! I looked for a more refined solution that will bring you much more security in your results. So let’s go!

Step 1 - Break your pdf into single pages so you can iterate one by one.

Step 2 - Use document understanding to extract the table from each PDF. It’s very simple, watch this 20 minute video. (UiPath Document Understanding: Extract Tables Out of PDFs - YouTube)

Step 3 - Merge all extracted tables.

Step 4 - Filter the final table with your required condition.

Take a look at the consistent result of the extraction I performed as a test:

Hope this helps!!!

19028426d · February 8, 2022, 5:20pm

Can your share your test file?
I am still following your steps

gabrielribas4 · February 8, 2022, 5:26pm

Sure!!
The .xaml:
Main.xaml (37.4 KB)

The Taxonomy:
taxonomy.json (4.3 KB)

The Sample Data:
sample data.pdf (1.1 MB)

Topic		Replies	Views
Multiple page pdf extraction to excel RPA Discussions coding , general	16	1714	May 1, 2022
Extracting specific PDF data from the specific page Help selector , uiautomation , studio , data_scraping	3	4192	December 11, 2017
How to extract tables when multiple pages in pdf file Studio studio , question , activities_panel	9	734	November 23, 2023
Converting Multi Page Bank PDF (Bank Statement) into Excel File Help excel , pdf , activities	10	7772	April 29, 2024
Using Data Scraping on PDF Help pdf , data_scraping	3	1659	July 18, 2018

Most Active Users - Yesterday
ashokkarale
Yoichi
singh_sumit
sven.wullum1
adi.mehare
ppr
sonaliaggarwal47
shahidh.aqeel.shahul
Akash_Javalekar1
Sami_Rajput
More details...

Data Scraping from PDF with multiple pages and tables into excel

Related topics