Read PDF text only reading last page from multiple page PDF

I have a PDF file with multiple pages that I am reading and writing to an output file using Tesseract OCR. However, for my 3 page PDF file, the last 2 pages read get overridden and only the last page text is obtained on to the output file.

Details:
Input range is set to all.Structure

Kindly try with other ocr like google and microsoft and try once
cheers @michaelamay0

I don’t have the licensing for other engines but Tessearct OCR works perfectly fine for data extraction from the PDF files I’m scraping from.

1 Like

Hey @michaelamay0, i am trying to do the same thing here, did you perhaps find a solution for this…

so far i am thinking of a way i could put this in a loop and read each page in a document one by one

Same here, trying with Google OCR and MS too, just the last page is extracting the data. Any news on this?