Read PDFwith OCR is not extracting all the pages using Tesseract OCR Engine

Hi,

I’m trying to read all the PDF files in a folder and extract data from it. The maximum number of pages each PDF contains is just 3 pages. I have used “Read PDF with OCR” activity with the property “Range” set to nothing, I’m using Tesseract OCR Engine because this engine extracts data in my PDF correctly. The issue is the OCR engine is extracting only first page.

@yaamini

Could you please take screenshot of Read PDF with OCR Activity properties and show me once.

Hi Lakshman, Thanks for the reply, I cannot take a screenshot since it is restricted. Sharing the details below

Read PDF with OCR
Input
DegreeOfParallelism: -1
ImageDPI: 300
Range: Empty(No Value)
Misc
Private: Unchecked
Output
Text: Empty

1 Like

@yaamini

Ok. It’s looks good.

Just specify range as “1-3” and then try once.

Yes, I did try that but another PDF file in the same folder has just 1 page, hence it throws an exception for that file.
I did try setting the range only for this file, no luck!

Hello, Write “All” in Range. Watch this video for more information.