Which PDF program is best for scraping data?

KCO_KJackson · March 10, 2020, 1:41am

When I try to scrape data from a PDF I cannot select individual elements. The screen just treats the whole page like one element. I am running the latest beta of Studio and Adobe Acrobat XI v 11.0.20. Let me know what other information would be helpful.

Nandhuba · March 10, 2020, 1:55am

Hai @KCO_KJackson Check below link

https://www.uipath.com/kb-articles/pdf-data-extraction-scrape-pdf-text

KCO_KJackson · March 10, 2020, 11:57am

I forgot to include that I am scraping a form. Specifically, SF 1449 contract forms. I need multiple data fields such as the solicitation number, addresses, etc… Regex also won’t work because the stream of text isn’t equivalent to positioning on the form. I’ll put together a sample in a bit and upload it to provide clarity.

isssoftwarehive · March 12, 2020, 1:44pm

Adobe Acrobot Reader DC is best program, that way all elements in pdf are identified anf it’s open source.
Find below the link to download )

And please note that you need to enable user elements in properties of the pdf.

liu_shubin · March 17, 2020, 2:47am

Not all the PDF files with Form can be used to extract data. You can open the file using Acrobat Reader DC->Edit->Preference->Click “OK”. Then you can use UiExplorer to try. You don’t need to change any setting in Acrobat Reader but it will work. You can try it using the sample file I attached.

Invoice_No_20180718001.pdf (34.9 KB)

Ioana_Gligan · November 18, 2020, 11:44am

How about trying document understanding? “scraping” pdfs is equal to trying to identify specific pieces of information from documents - and especially if this is a non-varying form, you might want to have a look into the Form Extractor or Regex Based Extractor.

There’s an academy course on document understanding that is pretty comprehensive, maybe it would be useful in our use case…

Topic		Replies	Views
Scraping an element from the PDF file Help studio	4	1045	October 26, 2018
UiPath not recognizing elements in Adobe Acrobat Reader 22.3.2031.0 Studio studio , question , activities_panel	5	1211	March 8, 2023
PDF Data Scraping Fail Studio studio , question	7	1213	March 11, 2022
Extracting Data from AdobeAcrobat Pdf Studio studio , question , output_panel	7	1259	November 3, 2021
Lesson 10 Practice 1 Can't find individual elements from pdf Academy Feedback	6	4021	June 14, 2019

Which PDF program is best for scraping data?

Related topics