I am currently working on a project that involves extracting a medication list from a PDF file using OCR technology.
The challenge lies in the varying number of pages in the PDF, which can range from 6 to 14 or more. Additionally, the medication list may appear on any page and often spans across one or two subsequent pages. The list is typically presented in a table format, adding to the complexity of the extraction process.
I would greatly appreciate any guidance or suggestions from the forum on how to effectively handle these challenges. If anyone has experience dealing with similar scenarios or can recommend best practices or tools, please share your insights.
Thank you!