Brainstorming Solutions for Editing Data in PDFs

ashwin.ashok · November 12, 2020, 12:08pm

I have a set of PDF files that contain two or three tables each and I wish to extract and store the tables into excel, but unfortunately, the activities present such as OCR and Read PDF Text won’t generate the desired outcome.

Is there any way to capture tables present inside PDF’s(Structured or unstructured) and store them into excel?
Help would be appreciated, and thanks in advance!
P.S. I’ve tried using EpsilonAI.Activities, that didn’t work either. If there are any other activities that will help with this, please do mention them.

prasath17 · November 12, 2020, 1:43pm

@ashwin.ashok - Can you try this?

https://epsilonai.com/how-to-extract-table-from-pdf-in-uipath

If it doesn’t work out, i would suggest to try Document Understanding(DU). It will work. If you have a sample pdf, can you please share(after redacting ). I have a DU workflow , i can try here in parallel.

ashwin.ashok · November 13, 2020, 4:47am

Hi @prasath17, I have tried it with the EpsilonAI package, but the tables aren’t getting recorded. I’ll include the same PDF in this comment.Sample.pdf (65.3 KB)

jeevith · November 30, 2020, 9:25am

Hi @ashwin.ashok,

If we assume your tables in the PDF have a standard pattern when the text is extracted, then there are two possible approaches (csv format is the savior in both):

Approach 1: Using only PDF activities

Suggested workflow: Main.xaml (12.4 KB)
Results first saved to temp.csv

Approach 2 - Open Pdf in word and extract the specific tables from word
Yes, you can open PDF files in word. Some pdfs wont work so well and will lose formating in word, but most structured ones will.

Read PDF in word.exe
Manipulate / convert the read text to a csv format (Hurdle! Multi level headers and multi values in single rows will lose formatting)
Handling the formatting
Write the resulted CSV text string to a temp.csv
Read the CSV and

Major Part of this solution is from : How to read table in a Word document - #16 by Puransse thanks to @vvaidya

Workflow from the above link (slight mofications): wordTables.xaml (9.6 KB)
Results first saved to wordCsv.csv

Hope this helps!

Cristian_Negulescu · February 28, 2021, 8:14pm

Hello Ashwin,
In this video, I have 17 use-cases for extracting tables from PDF and write data in Excel:

Your PDF is here at this point:
1:22:10 File 20 PDF with multiple columns that have multiple lines

Code:

github.com

cristinegulescu/startUiPathFromSalesforce/blob/master/PDFdecode.txt

        'FILE1
        Dim strtmp As String
        strtmp = strin.Substring(strin.IndexOf("Number"), strin.IndexOf("Subtotal") - strin.IndexOf("Number")).Trim
        strout = strtmp.Replace(" ", "|")

        strtmp = strin.Substring(strin.IndexOf("Subtotal") + 8)
        strpar = strtmp.Substring(0, strtmp.IndexOf(Environment.NewLine)).Trim


        'FILE2
        Dim strtmp As String
        Dim strout As String
        strout = "Col1|Col2|Col3|Col4"
        strtmp = strin.Substring(strin.IndexOf("Vacancies") + 11).Trim
        For Each line As String In strtmp.Split(New String() {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
            If (line.Length > 3) Then
                If (IsNumeric(line(0))) And (line(1) = " ") And (line(2) = " ") Then
                    strout = strout + Environment.NewLine + line.Replace("  ", "").Replace("  ", "|").Trim
                ElseIf (line(0) = "") And (line(1) = " ") And (line(2) = " ") Then
                    strout = strout + line.Replace("  ", "$").Trim()

This file has been truncated. show original

Thanks,
Cristian Negulescu

Topic		Replies	Views
Converting Pdf table to excel Activities excel , pdf , activities , studio	23	3680	January 18, 2023
PDF to Excel - Extract structured data Help excel , pdf , activities , studio	14	8663	November 28, 2018
Extract Table from PDF to Excel without DU Activities datatable , excel , pdf , activities , string , question	11	2318	August 4, 2021
Convert PDF File into Excel from studio Studio uiautomation , activities , studio , question , tools , pdf-to-excel	4	2940	August 6, 2023
Pdf table Activities pdf , activities , question	2	967	February 24, 2021

Brainstorming Solutions for Editing Data in PDFs

Related topics