I have a PDF file with multiple pages that I am reading and writing to an output file using Tesseract OCR. However, for my 3 page PDF file, the last 2 pages read get overridden and only the last page text is obtained on to the output file.
Details: Input range is set to all.
Kindly try with other ocr like google and microsoft and try once cheers @michaelamay0
I don’t have the licensing for other engines but Tessearct OCR works perfectly fine for data extraction from the PDF files I’m scraping from.
Hey @michaelamay0, i am trying to do the same thing here, did you perhaps find a solution for this…
so far i am thinking of a way i could put this in a loop and read each page in a document one by one
Same here, trying with Google OCR and MS too, just the last page is extracting the data. Any news on this?