Extract PDF tabular data and save to excel

Hi,
I have been working on a solution to extract data form multiple PDFs, these are invoice PDFs with tabular data of items, please see attached screenshot for reference.
There are various ways to extract specific data like name, invoice number etc from PDF however this seems to be challenging to get the data from the table of items and save to excel.
Has anyone come across to solve this kind of problem?
I will appreciate quick response.

1 Like

Try to use the

Document Understanding ML model

to extract the table from PDF

@hasib08 I really dont want to use DU at this point of time. Any other way with PDF activities ?

Have u tried data scraping

Yes I did try data scrapping however data is not consistent across PDFs.

Hi @ramvashista85

Here i tried extracting tabular data of pdf using string manipulations and regular expressions. Take a look that might be helpful.

Happy Automation!

Regards,
Aditya

Hi @desineediaditya, thank you for sharing the help.
Yeah string manipulation is always an option, before performing string manipulation I wanted to know if something out of the box technique available.
Thank you

Hi @ramvashista85

Tried with screen scrapping

If not try that

Let me know if it works

Regards

Nived N :robot:

Happy Automation :relaxed::relaxed:

Hi @ramvashista85,

Can you share the pdf ?

Regards
Balamurugan.S

Hi @ramvashista85
Can you share one sample PDF as @balupad14 want? In this scenario Abbyy flexi capture and Document understanding are a good tool.

I used OmniPage OCR here and below is the result of PDF files.


Regards
AnandMain.xaml (5.8 KB) dataPDF.pdf (132.7 KB)

1 Like

Hi @balupad14, sorry for delayed response, please get attached sample PDF
SamplePDF_280920_281092.pdf (21.3 KB)

There can be either single or multiple pages in PDF, please let me know how it goes.
Thank you.

Hello Ram,
In this video, I have 17 use-cases for extracting tables from PDF and write data in Excel:

Your PDF is at this time:
1:17:10 File 19 PDF with multiple pages and columns with multiple lines

Code:

Thanks,
Cristian Negulescu