Unreliable OCR data extraction from pdf forms

I’ve been using the anchor element to extract form field values to an excel sheet using UiPath Studio. However, the results that I’m getting are not reliable at all. I’m making sure multiple times nothing is at fault from my end developing the program. By unreliable results I mean getting different results while keeping the pdf tabs open vs closed randomly. Also got different results once when I ran the program vs restarting and running the same program!

Any one else faced this issue too? Any suggestions/solutions to this?

Thanks.

@sring1

Welcome to the community

On pdf it is not advised to use ui automation

Either read the data using read pdf text and use regex or strinng manipulation

Or use document understanding

If the pdf is a form enabled field…then we can use pdfsharp or itext7 and read each field separately as well

Cheers

Hi @sring1

  1. Install UiPath.PDF.Activities Package form Manage Package


  1. Use Read PDF Text (For Digital PDF) OR Read PDF with OCR (For Scanned PDF)
    For Read PDF with OCR-You Can use Any OCR Engine according to your Scanned PDF that best works in your case.

After you Read PDF - Assign in String variable Output and Do String Manipulation.

Hope it will helps you :slight_smile:
Cheers!!