Image PDF scraping

pdf
ocr
studio

#1

Hi

I need to scrape some details from a “scanned - image” PDF for which I used Anchor base activity with Find Image (pointing to the label) as the anchor and Get OCR text as the activity to extract value.

However when I execute this flow, I’m getting an error saying "Value does not fall within the expected range."
This error is coming from the Get OCR text activity.

Is it even possible to use Anchor Base for Scanned image pdf?

Please suggest.


#2

Hi!

Could you share a sample of you scanned pdf? I think this error might occur because it doesn’t recognize what you are indicating.


#3

Try using the “read pdf with OCR” activity to get the full text, then, do some string manipulations using substring and other methods to get the needed details.


#4

Do you have any examples related to string manipulations, if so please attach here


#5

Hi, I am new to UI Path and need some help in extracting Text out of Scanned Image PDF which is stored in particular location.PDF document contains similar structure of images and i need to extract specific text (i.e Name, Age, Father Name…) of that image document and the extracted information should be stored in .txt file or excel file.


#6

Hi,

I am trying to scap data from a digital pdf, however i tried different OCR methods and data scrapping. I am unable to identify which check boxes are checked. Any ideas?