Which PDF program is best for scraping data?

When I try to scrape data from a PDF I cannot select individual elements. The screen just treats the whole page like one element. I am running the latest beta of Studio and Adobe Acrobat XI v 11.0.20. Let me know what other information would be helpful.

Hai @KCO_KJackson Check below link


1 Like

I forgot to include that I am scraping a form. Specifically, SF 1449 contract forms. I need multiple data fields such as the solicitation number, addresses, etc… Regex also won’t work because the stream of text isn’t equivalent to positioning on the form. I’ll put together a sample in a bit and upload it to provide clarity.

Adobe Acrobot Reader DC is best program, that way all elements in pdf are identified anf it’s open source.
Find below the link to download :slight_smile:)


And please note that you need to enable user elements in properties of the pdf.

Not all the PDF files with Form can be used to extract data. You can open the file using Acrobat Reader DC->Edit->Preference->Click “OK”. Then you can use UiExplorer to try. You don’t need to change any setting in Acrobat Reader but it will work. You can try it using the sample file I attached.

Invoice_No_20180718001.pdf (34.9 KB)