Read Full pdf Text using OCR Vs using Anchor Based Approach



Hi Team,

I have multiple scanned pdfs(containing Invoice numbers and other information) that need to automated and objective is to fetch multiple values from pdfs and process them. I have used two different approach:

  1. Read Full Pdf Text Using Goggle OCR
    The text is not 100% correct . Also after reading the the scanned pdf text its getting difficult to come with a generic method to pull values from the output string because for other pdfs the position is getting changed . Many anonymous characters are also there.

  2. Anchor Base
    It makes pdf to be open in the system . Anchor works on positioning and dimensions of Get Text . that depends on screen resolution or size or version or type of pdf reader used.
    So its getting very unlikely that same Xaml file working on my system will work on other systems

Please let me in which scenario I should go for read full Ocr Pdf and where to use Anchor based approach. Or if there is any limitation with Ui Path for scan pdf data extraction.

Cheers ! !