Read PDF text only reading last page from multiple page PDF

I have a PDF file with multiple pages that I am reading and writing to an output file using Tesseract OCR. However, for my 3 page PDF file, the last 2 pages read get overridden and only the last page text is obtained on to the output file.

Input range is set to all.Structure

Kindly try with other ocr like google and microsoft and try once
cheers @michaelamay0

I don’t have the licensing for other engines but Tessearct OCR works perfectly fine for data extraction from the PDF files I’m scraping from.

1 Like

Hey @michaelamay0, i am trying to do the same thing here, did you perhaps find a solution for this…

so far i am thinking of a way i could put this in a loop and read each page in a document one by one

Same here, trying with Google OCR and MS too, just the last page is extracting the data. Any news on this?