Getting Specific texts using OCR from Invoice

Hi, I need to get a text from an invoice pdf using OCR activity.
The OCR method completely scraps the entire page, But I need a specific text.
here is the invoice -
Centrix_try1.pdf (303.6 KB)

Is there a method to get the specific text (similar to the anchor base)?
Pressing F3 and selecting the area works only when the text is in the same place in all pdfs, the other option I have is using regex (I don’t know regex yet), so is there any other method? Thank you.

Hello @manojj.yadav ,

This can be done in different methods.

  1. Using regex- Use Read PDF activity and get the datat to a string. Then use regex to extract it. This will be helpful if you need to extract from a set of pdf and the position is getting changed.

  2. Open the pdf and use Get Text activity- Here you need to open the pdf with a pdf reader and need to extract the values. You can use CV also here.

  3. Document understanding- iF your document is not structered prefering to go with this method. There are predefined ML mdoels available in Uipath to extract the values. You can train the model.

Please confirm which are the field that you want to extract and is the position of the label remains static.

So if the positions are not static, the other two options are ML model or regex only?

@manojj.yadav else using Get Text you have to give proper anchor.

For eg: if there is label called Name and you want to extract the Name value. Then in the selector of Get Text you need to add the anchor to Name label.

Hi @manojj.yadav

use Get text activity and use split to extract the account
as it as follows other also.

Thank
Shyam

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.