Generic PDF Extraction


Are there any generic solutions to extract unstructured data from pdfs with different templates?


Yah at initial level we can use simple READ PDF or READ PDF with OCR and do string manipulation like using regex or split method to get the value we need

Cheers @enoondusandeepa

Yes. I have achieved this. But the problem now is, manipulation of data to bring it to structure. Using string or regex, we have write different workflows for number of templates. Is there any universal / generic tool or any possible code that extracts data given in case from any PDF of any template?


1 Like

no buddy, i hope we dont have any such, and as it is a kind of process rather a template,
each pdf has its own format, each process will be defined accordingly

Cheers @enoondusandeepa

To perform truly universal extraction you would probably need to utilize Intelligent OCR