Extract table data from PDF to csv

Sailaja_Chikkam · July 13, 2018, 7:23am

Hi All, I am new to UIPath. I wanted to extract table data from PDF to CSV file .

Attached PDF File for your reference .Please someone help me out on this.

Appreciate your kind help and support.

TURBO COOLING SYSTEM WE MEAN COOLING_1.pdf (132.0 KB)

akila93 · July 13, 2018, 7:29am

Hi @Sailaja_Chikkam,

Use screen scarping to get the data.

and Form it into datatable and use write range activity to write the data into CSV.

Refer this link

Sailaja_Chikkam · July 13, 2018, 7:33am

Through Screen scrapping , i am not getting required output. All the column values are getting extracted into 1 single column

akila93 · July 13, 2018, 7:37am

yes @Sailaja_Chikkam, based on your requirement you need to split the data and create the table.

split the string using tab

Sailaja_Chikkam · July 13, 2018, 7:38am

I did but it is not splitting properly.

Dev · July 13, 2018, 8:32am

I just gave it a try to screen scrape the table inside the PDF file provided, and i see what you mean.

First of all, its not clear if the table are gonna switch in content?
By that i mean, are you gonna screen scrape multiple PDF files with different tables?
(If you are, you need to modify my solution to be more dynamic, this is only to show possibilities in your specific case).

If so i would do something like this. (could be a little tricky at first try)

Open PDF file in Adobe Reader
Use UiPath Explorer to target the table elements one by one

// This will target Adobe Reader, with a title unknown because we set “*” star in the title attribute.
<wnd app='acrord32.exe' cls='AcrobatSDIWindow' title='* - Adobe Reader' />

// Targeting the row we want to get data from (in this case row 1)
<ctrl idx='1' role='row' />
// Then the column header 1
<ctrl role='column header' idx='1' />
All of these informations are coming from UiPath explorer and can be reused for every “row - column” you want to target and “convert” the result to CSV formatted string.

Use OCR activity “Get OCR Text” and copy the selected item inside the selector property of “Get OCR Text” activity.
Now for every “row - column” you want to read, change the:
// Targeting the row we want to get data from (in this case row 1)
<ctrl idx='1' role='row' />
// Then the column header 2
<ctrl role='column header' idx='2' />

And when its not the header you are reading anymore, change the role to cell instead of column header:
<ctrl role='column header' idx='2' />

to

<ctrl role='cell' idx='5' />

I know that there is a bit of work in this solution, but you will have full control of the result/output you get from the table.
And i think it gives alot of new knowledge if you just started UiPath journey.

The above will result in:
S.N

Let me know if anything is unclear.

Sailaja_Chikkam · July 13, 2018, 9:18am

Hi @Dev ,

I am able to retrieve single values with the selector . I want to extract entire column data as a string… With this approach i am only able to extract specified cell value .

Thanks for your Support!

Topic		Replies	Views
Extract pdf table data into an Excel Help	32	10355	May 8, 2020
Unable to extract table data from pdf file Studio studio , question , tools	4	1292	October 10, 2022
Extracting Data from PDF table to csv Help	7	2710	February 21, 2019
Extraction of Pdf table in UiPath Help studio	4	958	March 18, 2019
How to parse a Non-tagged PDF containing tables using UiPath? Studio pdf , activities , question	8	1142	February 24, 2021

Extract table data from PDF to csv

Related topics