Extract Table from pdf using Data Scraping

Hi there,

I am trying to extract table from a pdf (screenshot below):

I have used Data Scraping extraction wizard. However, I am unable to extract column name correctly. Following is the output:


[Column-0,Participants,Ballots Completed,Ballots Incomplete/ Terminated,Results,Column-5




,"34.5%, n=1

","1199 sec, n=1

Low Vision




,"98.3% n=2

(97.7%, n=3)

","1716 sec, n=3

(1934 sec, n=2)





,"98.3%, n=4

","1672.1 sec, n=4





,"95.4%, n=3

","1416 sec, n=3



Question: How can I improve the table extraction to get correct column names?

Cheers :slight_smile:

Below is the screenshot of extraction wizard:

Here’s how my code looks like:

Pdf file used for this exercise can be downloaded from https://www.w3.org/WAI/WCAG20/Techniques/working-examples/PDF20/table.pdf

@husain.shah Are you using SilverLight Extension ?

No. I am not using silverlight ext.

@husain.shah Is the PDF file stored in your System?

Yes. i downloaded the pdf from the source above and working on a local copy.

@husain.shah Then Have you tried PDFtoExcel Activity ?

PDFtoExcel Activity uses SautinSoft api which has a trial version that only converts 3 pages of PDF and it is for evaluation purposes only. I am interested in a free solution.

Hi Hussain,
Were you able to find the solution?

hey did you find any free and viable solutions to extract data table from pdf?

Hello shero,
Yes, I tried epsilon package for the same. However, it is not the best solution but definitely worth a try and it is free of cost.

Link below -

It did not work accurately for me.
I want to extract tabular data row wise based some regex.

@supermanPunch @Shah_Hussain @husain.shah

@shero I am looking for same .
But even if exact the table from pdf into datatable would work for me , but without data scrapping and epsilonAI activity or third party package (coz of security reasons)
Please guide

It’s asking license key can you provide that license key file

@shero @Sakshi_Jain @siva_sankar

Hello guys,
Yes, Epsilon has started asking for subscription keys now. However, I am working on this and will get back to you guys very soon.

1 Like

Thank you
Please update once because lot of pdf work in that format