Pdf extraxction

activities

#1

I am trying to extract the pdf data.Sample Invoice B.pdf (365.4 KB)

Same file is used in the videos of uipath academy. In the videos the pdf data are extracted using get text and selectors are getting generated. But when I am trying selectors are not getting generated. I have only one option OCR


#2

Hi @Shashi123! You bring up a good point, the tutorial will not work on OCR out of the box. You need to run your PDFs through OCR tools that add a text layer, like Tesseract or Abbyy. However, reliable selectors will still be difficult to create.

A good alternative for OCRs is to use a cognitive data capture service to find data - it requires no explicit rule setup but rather uses AI to identify information, which is much better setup here. There’s a couple, one tutorial for UIPath to take a look at: https://rossum.ai/blog/2018/07/30/automating-data-extraction-from-invoices-using-rossum-api-and-uipath/


#3

Hi Shashi,

Try here: Selecting PDF Elements

Some settings may need to be changed within adobe reader. It’ll work once you follow the solution here :slight_smile:.