Need Help with OCR-Based Medication List Extraction from Variable PDF Formats

@jai_kumar2

If you are on latest version try using docpth model …its a generative ai mode ehich can work on dynamic pdfs

Alternately…identify the table column names as key words and first try to loop through each oge and find whcih pages have those keywords and separate those pges

Then you can leverage form extractor if the table structure is same across

Else train a model to extract those tables and feed only the pges with those keywords and get the data

Cheers