I am new to UiPath and I am unable to extract specific data from scanned pdf invoice. when trying to get purchase order number by using “Get text ocr” it getting different value, cloud any help.
which Ocr Engine you are using to extract
after extracting you have to use Regex to extract Specific Data
Is it an digitally native PDF or a scanned one?
In the case of a digitally native, you can use the “GET TEXT FROM PDF” activity and use regex or split in order to specific data.
Else, if it is an SCANNED PDF you should use a really good OCR (e.g. UiPath Computer Vision OCR activities, Abbyy, Google, etc) to extrain the most accurate text and then use REGEX or SPLIT. In the case you use Abbyy or the UiPath Invoice Extraction activities you will be able to train the algorithm to extract specific data.
I am using both tesseract and Microsoft OCR and how to use regex activity if the invoice contains “item Id” which differs from one to another. Ex Item id 1 contains 15 digit and item id 2 : 16 digit
can we see the sample of data which we are trying to extract??
PFA below data, which I am trying to extract
I have attached a screenshot of data which we are trying to extract, sorry unable to attach the pdf because of being new to uipath forum