Hi,
I have been working on a solution to extract data form multiple PDFs, these are invoice PDFs with tabular data of items, please see attached screenshot for reference.
There are various ways to extract specific data like name, invoice number etc from PDF however this seems to be challenging to get the data from the table of items and save to excel.
Has anyone come across to solve this kind of problem?
I will appreciate quick response.
Try to use the
Document Understanding ML model
to extract the table from PDF
@hasib08 I really dont want to use DU at this point of time. Any other way with PDF activities ?
Have u tried data scraping
Yes I did try data scrapping however data is not consistent across PDFs.
Here i tried extracting tabular data of pdf using string manipulations and regular expressions. Take a look that might be helpful.
Happy Automation!
Regards,
Aditya
Hi @desineediaditya, thank you for sharing the help.
Yeah string manipulation is always an option, before performing string manipulation I wanted to know if something out of the box technique available.
Thank you
Tried with screen scrapping
If not try that
Let me know if it works
Regards
Nived N
Happy Automation
Hi @ramvashista85
Can you share one sample PDF as @balupad14 want? In this scenario Abbyy flexi capture and Document understanding are a good tool.
I used OmniPage OCR here and below is the result of PDF files.
Regards
AnandMain.xaml (5.8 KB) dataPDF.pdf (132.7 KB)
Hi @balupad14, sorry for delayed response, please get attached sample PDF
SamplePDF_280920_281092.pdf (21.3 KB)
There can be either single or multiple pages in PDF, please let me know how it goes.
Thank you.