When I try to scrape data from a PDF I cannot select individual elements. The screen just treats the whole page like one element. I am running the latest beta of Studio and Adobe Acrobat XI v 11.0.20. Let me know what other information would be helpful.
Hai @KCO_KJackson Check below link
https://www.uipath.com/kb-articles/pdf-data-extraction-scrape-pdf-text
I forgot to include that I am scraping a form. Specifically, SF 1449 contract forms. I need multiple data fields such as the solicitation number, addresses, etc… Regex also won’t work because the stream of text isn’t equivalent to positioning on the form. I’ll put together a sample in a bit and upload it to provide clarity.
Adobe Acrobot Reader DC is best program, that way all elements in pdf are identified anf it’s open source.
Find below the link to download )
And please note that you need to enable user elements in properties of the pdf.
Not all the PDF files with Form can be used to extract data. You can open the file using Acrobat Reader DC->Edit->Preference->Click “OK”. Then you can use UiExplorer to try. You don’t need to change any setting in Acrobat Reader but it will work. You can try it using the sample file I attached.
Invoice_No_20180718001.pdf (34.9 KB)
How about trying document understanding? “scraping” pdfs is equal to trying to identify specific pieces of information from documents - and especially if this is a non-varying form, you might want to have a look into the Form Extractor or Regex Based Extractor.
There’s an academy course on document understanding that is pretty comprehensive, maybe it would be useful in our use case…