Read PDFwith OCR is not extracting all the pages using Tesseract OCR Engine

yaamini · June 28, 2019, 12:58pm

Hi,

I’m trying to read all the PDF files in a folder and extract data from it. The maximum number of pages each PDF contains is just 3 pages. I have used “Read PDF with OCR” activity with the property “Range” set to nothing, I’m using Tesseract OCR Engine because this engine extracts data in my PDF correctly. The issue is the OCR engine is extracting only first page.

lakshman · June 28, 2019, 12:59pm

@yaamini

Could you please take screenshot of Read PDF with OCR Activity properties and show me once.

yaamini · June 28, 2019, 3:40pm

Hi Lakshman, Thanks for the reply, I cannot take a screenshot since it is restricted. Sharing the details below

Read PDF with OCR
Input
DegreeOfParallelism: -1
ImageDPI: 300
Range: Empty(No Value)
Misc
Private: Unchecked
Output
Text: Empty

lakshman · June 28, 2019, 4:06pm

@yaamini

Ok. It’s looks good.

Just specify range as “1-3” and then try once.

yaamini · June 29, 2019, 3:30am

Yes, I did try that but another PDF file in the same folder has just 1 page, hence it throws an exception for that file.
I did try setting the range only for this file, no luck!

Manish_Pandey · June 29, 2019, 4:23am

Hello, Write “All” in Range. Watch this video for more information.

Topic		Replies	Views
Read PDF text only reading last page from multiple page PDF Help	4	1615	July 30, 2020
Read PDF with OCR only reads couple of pages or last page Activities pdf , activities , question	6	2283	July 29, 2021
Read PDF text Issue Help activities	3	1127	May 25, 2018
Read PDF with OCR only reads first page Studio studio , question , activities_panel	1	1386	March 18, 2022
Extract data from PDF using get OCR text Help	2	1118	April 14, 2020

Most Active Users - Yesterday
Anil_G
jast1631
yuichi
More details...

Read PDFwith OCR is not extracting all the pages using Tesseract OCR Engine

Related topics