Need Help with OCR-Based Medication List Extraction from Variable PDF Formats

jai_kumar2 · December 13, 2024, 11:14am

I am currently working on a project that involves extracting a medication list from a PDF file using OCR technology.

The challenge lies in the varying number of pages in the PDF, which can range from 6 to 14 or more. Additionally, the medication list may appear on any page and often spans across one or two subsequent pages. The list is typically presented in a table format, adding to the complexity of the extraction process.

I would greatly appreciate any guidance or suggestions from the forum on how to effectively handle these challenges. If anyone has experience dealing with similar scenarios or can recommend best practices or tools, please share your insights.

Thank you!

Anil_G · December 14, 2024, 1:34pm

@jai_kumar2

If you are on latest version try using docpth model …its a generative ai mode ehich can work on dynamic pdfs

Alternately…identify the table column names as key words and first try to loop through each oge and find whcih pages have those keywords and separate those pges

Then you can leverage form extractor if the table structure is same across

Else train a model to extract those tables and feed only the pges with those keywords and get the data

Cheers

Topic		Replies	Views
OCR - Text extraction from variable PDF files Studio studio , question , activities_panel	5	1479	August 24, 2021
Extracting tables with varying number of items from pdf using Document Understanding Studio studio , question , document_understanding , activities_panel , table-extraction	9	2185	March 14, 2022
PDF extraction From input file data Activities uiautomation , activities , question	7	532	March 29, 2023
Exctract table data from complicated pdf structure Studio studio	2	817	February 7, 2022
Extract Varying Size PDF Using Document Understanding Action Center uiautomation , studio , question , document_understanding , action_center	2	776	February 2, 2023

Need Help with OCR-Based Medication List Extraction from Variable PDF Formats

Related topics