How to extract table data from pdf

Hi,

How to extract the table data present in pdf file and that has to be written into excel file.

Thanks,
Lakshmi

1 Like

Hey @lakshmi.mp

  1. If you are going with UI Automation kindly use Data Scraping technique

  2. You can also read the PDF with preserve format & doing string manipulation to split the table from the entire raw content. Then the table string should be passed to Generate Data Table activity where you need to choose an appropriate delimiter and that will give you an output data table

I prefer the second method as it will be all background.

Hope that helps.

Thanks
#nK

Hi,

I will try to do, thanks.

Regards,
Lakshmi

Hi @lakshmi.mp ,

If the PDF is a Digital PDF, we should be able to use Read PDF Text activity, Get the Text and Apply Regex to get the Data as Required.

From the format shown, it does look pretty much possible, as there are no continuous Columns which have Text Data.

If you could provide us with the Sample PDF, we should be able to provide you with the Regex/Solution xaml itself for you to work on.

pdf task.pdf (118.0 KB)
Hi,

Sample file has been attached, please look on it.

Thanks,
Lakshmi

Hi @lakshmi.mp ,

Are there cases in the real project, where the Data is not in the Same Line in a Cell ? Like in the Pdf you have provided :
image

In your Earlier Screenshot we don’t see any data in the next line in a Cell. If you can confirm this it will be easier for us to Decide, what method we would require to use for Extraction.

Hi,

By looking into that screenshot i have created that pdf file.

Thanks,
Lakshmi

@lakshmi.mp
Go to manage packages> All package > download package FestSystems.PDFtoExcel.Activities

Split the pdf into single single pages using a pdf splitter activity and use the pdf to excel activity

image

Regard
Muhamed fasil

Hi @muhamed.fasil ,

I will try to do, thank you.

Regards,
Lakshmi M P

@lakshmi.mp Please tick as solution if you find it right, so that it will be helpful for others
Cheers
muhamed fasil

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.