Extract PDF tabular data and save to excel

ramvashista85 · September 11, 2020, 4:54am

Hi,
I have been working on a solution to extract data form multiple PDFs, these are invoice PDFs with tabular data of items, please see attached screenshot for reference.
There are various ways to extract specific data like name, invoice number etc from PDF however this seems to be challenging to get the data from the table of items and save to excel.
Has anyone come across to solve this kind of problem?
I will appreciate quick response.

hasib08 · September 11, 2020, 4:59am

Try to use the

Document Understanding ML model

to extract the table from PDF

ramvashista85 · September 11, 2020, 5:04am

@hasib08 I really dont want to use DU at this point of time. Any other way with PDF activities ?

hasib08 · September 11, 2020, 5:07am

Have u tried data scraping

ramvashista85 · September 12, 2020, 6:06am

Yes I did try data scrapping however data is not consistent across PDFs.

desineediaditya · September 12, 2020, 7:06am

Hi @ramvashista85

Here i tried extracting tabular data of pdf using string manipulations and regular expressions. Take a look that might be helpful.

Happy Automation!

Regards,
Aditya

ramvashista85 · September 12, 2020, 7:10am

Hi @desineediaditya, thank you for sharing the help.
Yeah string manipulation is always an option, before performing string manipulation I wanted to know if something out of the box technique available.
Thank you

NIVED_NAMBIAR · September 13, 2020, 5:47am

Hi @ramvashista85

Tried with screen scrapping

If not try that

Let me know if it works

Regards

Nived N

Happy Automation

balupad14 · September 14, 2020, 7:06am

Hi @ramvashista85,

Can you share the pdf ?

Regards
Balamurugan.S

anandji05 · September 14, 2020, 7:25am

Hi @ramvashista85
Can you share one sample PDF as @balupad14 want? In this scenario Abbyy flexi capture and Document understanding are a good tool.

I used OmniPage OCR here and below is the result of PDF files.

Regards
AnandMain.xaml (5.8 KB) dataPDF.pdf (132.7 KB)

ramvashista85 · September 28, 2020, 12:31am

Hi @balupad14, sorry for delayed response, please get attached sample PDF
SamplePDF_280920_281092.pdf (21.3 KB)

There can be either single or multiple pages in PDF, please let me know how it goes.
Thank you.

Cristian_Negulescu · February 28, 2021, 8:16pm

Hello Ram,
In this video, I have 17 use-cases for extracting tables from PDF and write data in Excel:

Your PDF is at this time:
1:17:10 File 19 PDF with multiple pages and columns with multiple lines

Code:

github.com

cristinegulescu/startUiPathFromSalesforce/blob/master/PDFdecode.txt

        'FILE1
        Dim strtmp As String
        strtmp = strin.Substring(strin.IndexOf("Number"), strin.IndexOf("Subtotal") - strin.IndexOf("Number")).Trim
        strout = strtmp.Replace(" ", "|")

        strtmp = strin.Substring(strin.IndexOf("Subtotal") + 8)
        strpar = strtmp.Substring(0, strtmp.IndexOf(Environment.NewLine)).Trim


        'FILE2
        Dim strtmp As String
        Dim strout As String
        strout = "Col1|Col2|Col3|Col4"
        strtmp = strin.Substring(strin.IndexOf("Vacancies") + 11).Trim
        For Each line As String In strtmp.Split(New String() {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
            If (line.Length > 3) Then
                If (IsNumeric(line(0))) And (line(1) = " ") And (line(2) = " ") Then
                    strout = strout + Environment.NewLine + line.Replace("  ", "").Replace("  ", "|").Trim
                ElseIf (line(0) = "") And (line(1) = " ") And (line(2) = " ") Then
                    strout = strout + line.Replace("  ", "$").Trim()

This file has been truncated. show original

Thanks,
Cristian Negulescu

Topic		Replies	Views
How to extract tabular data from an invoice with uipath activity Activities pdf	4	1034	August 31, 2022
Extract PDF tabular data Studio datatable , excel , pdf , activities , data_scraping	10	1794	February 24, 2020
How to extract pdf multiple line of data into excel sheet Academy Feedback pdf , activities , data_scraping , question	4	1813	September 12, 2022
How to get table from invoice Help activities	10	1991	February 24, 2021
Extracting specific PDF data from the specific page Help selector , uiautomation , studio , data_scraping	3	4192	December 11, 2017

Most Active Users - Yesterday
Yoichi
Anil_G
jast1631
Gokul001
yedukondaluaregala
Jon_Smith
adi.mehare
More details...

Extract PDF tabular data and save to excel

Related topics