Extract Table from pdf using Data Scraping

husain.shah · February 26, 2020, 5:32am

Hi there,

I am trying to extract table from a pdf (screenshot below):

I have used Data Scraping extraction wizard. However, I am unable to extract column name correctly. Following is the output:

Blockquote

[Column-0,Participants,Ballots Completed,Ballots Incomplete/ Terminated,Results,Column-5
Blind

,5

,1

,4

,"34.5%, n=1

","1199 sec, n=1

"
Low Vision

,5

,2

,3

,"98.3% n=2

(97.7%, n=3)

","1716 sec, n=3

(1934 sec, n=2)

"
Dexterity

,5

,4

,1

,"98.3%, n=4

","1672.1 sec, n=4

"
Mobility

,3

,0

,"95.4%, n=3

","1416 sec, n=3

"
]

Blockquote

Question: How can I improve the table extraction to get correct column names?

Cheers

husain.shah · February 26, 2020, 5:33am

Below is the screenshot of extraction wizard:

husain.shah · February 26, 2020, 5:34am

Here’s how my code looks like:

husain.shah · February 26, 2020, 5:35am

Pdf file used for this exercise can be downloaded from https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF20/table.pdf

supermanPunch · February 26, 2020, 5:38am

@husain.shah Are you using SilverLight Extension ?

husain.shah · February 26, 2020, 5:51am

No. I am not using silverlight ext.

supermanPunch · February 26, 2020, 5:52am

@husain.shah Is the PDF file stored in your System?

husain.shah · February 26, 2020, 5:53am

Yes. i downloaded the pdf from the source above and working on a local copy.

supermanPunch · February 26, 2020, 5:55am

@husain.shah Then Have you tried PDFtoExcel Activity ?

husain.shah · February 26, 2020, 6:02am

PDFtoExcel Activity uses SautinSoft api which has a trial version that only converts 3 pages of PDF and it is for evaluation purposes only. I am interested in a free solution.

Shah_Hussain · May 4, 2020, 8:47am

Hi Hussain,
Were you able to find the solution?

shero · June 17, 2020, 5:51am

hey did you find any free and viable solutions to extract data table from pdf?

Shah_Hussain · June 17, 2020, 7:57am

Hello shero,
Yes, I tried epsilon package for the same. However, it is not the best solution but definitely worth a try and it is free of cost.

Link below -
https://epsilonai.com/how-to-extract-table-from-pdf-in-uipath

shero · June 17, 2020, 11:13am

It did not work accurately for me.
I want to extract tabular data row wise based some regex.

Sakshi_Jain · May 12, 2021, 12:11pm

@supermanPunch @Shah_Hussain @husain.shah

@shero I am looking for same .
But even if exact the table from pdf into datatable would work for me , but without data scrapping and epsilonAI activity or third party package (coz of security reasons)
Please guide

siva_sankar · December 9, 2021, 5:20am

It’s asking license key can you provide that license key file
Please

Shah_Hussain · January 3, 2022, 9:46am

@shero @Sakshi_Jain @siva_sankar

Hello guys,
Yes, Epsilon has started asking for subscription keys now. However, I am working on this and will get back to you guys very soon.

siva_sankar · January 3, 2022, 10:05am

Thank you
Please update once because lot of pdf work in that format

Topic		Replies	Views
Extract specific table within PDF Form with RegEx Studio studio , question , activities_panel	12	1818	March 8, 2023
Assist of Extract pdf data Activities pdf , activities	9	1801	April 27, 2021
Extract data from PDF(vertically) Help excel , uiautomation , pdf , studio	14	1975	October 10, 2019
Extract pdf table data into an Excel Help	32	10461	May 8, 2020
Table extraction from Pdf Studio studio , question , tools	7	3557	March 5, 2023

Extract Table from pdf using Data Scraping

Related topics