Problem with data scraping in PDF

Hi team,

I have issue with scraping data in PDF. Sometimes its not correctly read by uipath and thats messing with selectors. I will show on example:


This is good selecting - able to select only table

This is bad selecting - cannot separate table, selecting whole document

Also when using OCR on PDF there is pop-up from Adobe: “please wait, doc is prepared to read”, if im doing that manually this is fine, but i think robot sometimes skip it and is not able to OCR doc correctly - but how to avoid it? Reading like 50pdf sometimes its fail on third, sometimes on 40th

Hi

Did we try keeping some delay between each file before being read and processed

@AndRewDev

Hello Palaniyappan,

Yes there is a lot of delay between closing/opening new PDF, as im processing first data collected. I think the problem is in Adobe that sometimes it`s not prepared to OCR and Uipath can not pick these selectors correctly. The only walkaround i have for now is, just doing retry and try re-process document again.

That would work either
I would suggest to use Omnipage ocr engine

For that we need to install the package UiPath.OmniPage.Activity from Manage Packages in design tab

Cheers @AndRewDev

after this pop-up appears the bot breaks?
If that pop-up can be read and handled that would solve this issue. Once you see that pop-up that can be handled by clicking ok and retry opening that file again.

Also, wondering why you need to open on screen to use OCR scraping. You can consider reading PDF with OCR as suggested above.

Reading whole PDF with OMNI OCR is fine for me, however i`m not using it as i cannot extract substrings i need. For example this is reading it as

  1. XXX Invoice number XXX/XXX XX XXX 1 050,70 23 240,50 1 292,42

I can extract invoice number and its name, but i have problem with netto/brutto amounts when there is more then thousand, because im spliting whole string by spaces and i just know which index is for what, but when there is amount like >1k ex: 1 050,70 it`s spliting it like 1 and 050,70 so this is messing my whole idea and i resigned from it. If you would had any idea if this can be done with string manipulation i would be gladly.