Extract table data from multiple pages of pdf

AmanGarg · January 25, 2017, 9:35am

Hi,

I am trying to extract data from a table spanning multiple pages of a pdf file. I can’t use read pdf as the table has empty cells and read pdf will misalign the columns.

Any pointers?

Cheers,
Aman

Lavinia · January 26, 2017, 6:51pm

In 2016.2 You can use Extract Structured Data from recording to extract data from pdf tables, have you tried that?

AmanGarg · January 27, 2017, 10:12am

Yes, I have tried that and it can extract the data from one page. But it is unable to extract data from 2nd page onwards. On web there is an option to select next page, but while reading pdf that option is missing.

Vinay · January 27, 2017, 10:51am

tables will be having different idx, increment the same and loop until it exists and extract to data table

AmanGarg · January 27, 2017, 1:04pm

This works. Thanks for the help.

joo77uip · March 31, 2017, 6:53pm

I don’t understand how to do that…can you please explain?

Amarjeet · April 10, 2017, 12:57pm

Can U Please explain with a workflow .

sunilkushwaha · August 18, 2017, 7:16am

Could you please explain with example?

Vinay · August 23, 2017, 11:10am

Get selector for one of the table and check its idx value, then again get the selector of the next table and check its idx value, this will help you to figure out the selector with variable idx value and fetch its value.

sunilkushwaha · August 25, 2017, 9:18am

Thanks alot

sourav · August 30, 2017, 11:43am

Hi,
I am trying to read tabular data from a PDF(native) file which spans through multiple pages.
I tried read PDF text but the string is lengthy and very difficult to parse all the outputs. I am able to use data scraping for each page by changing the index in selector but the structure is not preserved. Can you please try and suggest me a solution for this?Acrobat Document.pdf (521.0 KB)

Vinay · September 5, 2017, 11:24am

Down below is the selector whose ctrl idx value needs to be incremented till it exists. You can use a variable in place of 55 below which needs to be incremented till the selector exists

<wnd app='acrord32.exe' cls='AcrobatSDIWindow' title='Acrobat Document.pdf - Adobe Reader' />
<wnd cls='AVL_AVView' title='AVPageView' />
<ctrl idx='55' role='row' />

"<wnd app='acrord32.exe' cls='AcrobatSDIWindow' title='Acrobat Document.pdf - Adobe Reader' />
    <wnd cls='AVL_AVView' title='AVPageView' />
    <ctrl idx='" + CounterValue + "' role='row' />"

sourav · September 5, 2017, 11:53am

Hi Vinay,
I am trying but it is not working. Can you please share the .xaml?
Which field you are getting using below selector?

Vinay · September 6, 2017, 6:55am

Suppose for example declare an Integer variable for counter(counter), use second variable(rowSelector) which should be of string type to assign the selector as above, and set the variable rowSelector in place of the selector property

Initialize counter
Start loop
assign rowSelector
(This should update the rowSelector in each loop resulting in new selector for each new row found in PDF

"<wnd app='acrord32.exe' cls='AcrobatSDIWindow' title='*.pdf - Adobe Reader' />
<wnd cls='AVL_AVView' title='AVPageView' />
<ctrl idx='" + counter + "' role='row' />"

)
Check if selector exists / present
Fetch value using the rowSelector if the selector exists
Increment loop
End loop

11113 · October 3, 2017, 9:01am

I have a same question. And I tried with below advices, but I failed.
How about the result? can you share it?
Thanks very much.

REGARDS

aamir · January 16, 2018, 6:56am

Hi Aman,

I am trying to extract tables from pdf but i am not able to do so. Could you please help me in this?
I have tried both screen and data scraping method as well.

patak001 · March 15, 2018, 12:34pm

Hi Vinay,

I tried with your approach…code is running but data in not coming into CSV file. for single page it loading data.

I am a new user so not allowed to upload .XAML file.

Kindly suggest .

Regards,
Akhilesh

Topic		Replies	Views
How to scrape the structured data (tables) in a PDF doc,which is spanning across multi pages Academy Feedback pdf , activities , data_scraping , question	0	916	May 6, 2020
Extract table spanning multiple pages of pdf Help datatable , activities , studio	1	1481	September 24, 2018
Extract structured table from pdf spanning in multiple pages Help	0	981	January 14, 2019
Extract PDF tabular data Studio datatable , excel , pdf , activities , data_scraping	10	1923	February 24, 2020
Extract table data from multiple pages of a pdf file into excel file Help	3	1730	December 1, 2018

Extract table data from multiple pages of pdf

Related topics