Read PDF text only reading last page from multiple page PDF

michaelamay0 · August 5, 2019, 6:35pm

I have a PDF file with multiple pages that I am reading and writing to an output file using Tesseract OCR. However, for my 3 page PDF file, the last 2 pages read get overridden and only the last page text is obtained on to the output file.

Details:
Input range is set to all. Structure

Palaniyappan · August 5, 2019, 6:39pm

Kindly try with other ocr like google and microsoft and try once
cheers @michaelamay0

michaelamay0 · August 5, 2019, 7:03pm

I don’t have the licensing for other engines but Tessearct OCR works perfectly fine for data extraction from the PDF files I’m scraping from.

SenzoD · February 6, 2020, 9:57am

Hey @michaelamay0, i am trying to do the same thing here, did you perhaps find a solution for this…

so far i am thinking of a way i could put this in a loop and read each page in a document one by one

SabrinaG · July 30, 2020, 10:14pm

Same here, trying with Google OCR and MS too, just the last page is extracting the data. Any news on this?

Topic		Replies	Views
Read PDFwith OCR is not extracting all the pages using Tesseract OCR Engine Help activities	5	2118	June 29, 2019
PDF Text Help	6	1381	May 13, 2019
Read PDF text Issue Help activities	3	1132	May 25, 2018
Extract data from PDF using get OCR text Help	2	1140	April 14, 2020
Read PDF with OCR only reads couple of pages or last page Activities pdf , activities , question	6	2342	July 29, 2021

Read PDF text only reading last page from multiple page PDF

Related topics