Extract tabular data from PDF

dorai123 · December 3, 2019, 1:19pm

I need to extract structured data from a pdf and store it in excel. Normal data scraping and screen scraping don’t work.

When I use read pdf activity i get the unsorted data, but i dont know how to manipulate it.

I tried to use anchor base activity but i get only one record. I need to extract the following columns and their data

Name

License #

Action

Violation

Effective Date

MaxyArthes · December 3, 2019, 4:42pm

Can you share the PDF file? That would help to test around to help you. If your company allow it of course.

Mahesh_Gunda · December 3, 2019, 5:16pm

hi @dorai123

Can you retry ‘Read PDF Text’ activity with its property ‘Preserve Formatting’ set to ‘True’?

The resultant string looks structured and can be manipulated using ‘Text to Columns’ feature in excel. Forum

dorai123 · December 4, 2019, 8:19am

Hi,

Thanks, this preserves the table structure and gets the data in tabular format but how do i extract only specific Columns required like i mentioned in my post from this and store it in Excel.

dorai123 · December 4, 2019, 8:22am

Hi,

Sorry unfortunately I cannot share the pdf, that is why provided a snapshot of the pdf content, it contains the data in the format shown in the screenshot

Mahesh_Gunda · December 4, 2019, 1:18pm

Read PDF activity returns a string which needs additional processing to fetch the desired data. On the contrary, data scraping activity results in a datatable which can be manipulated easily.

So can you share the error you encountered while using ‘data scraping’ activity?

Because when i use ‘data scraping’ activity using sample data, its working fine.

if you can copy and share only the table data in a separate pdf file, it will be of great help to understand it better.

Thanks

dorai123 · December 11, 2019, 12:31pm

Hi,

Sorry for the late reply, after changing options in Acrobat I am able to retrieve the data using data scraping like you said.

Thanks

system · December 14, 2019, 12:32pm

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Extract Table data from PDF Help datatable , studio	19	16567	August 29, 2019
Extract only some columns in a PDF data table to excel Activities datatable , pdf , activities , data_scraping , question	2	1191	February 9, 2022
PDF to Excel - Extract structured data Help excel , pdf , activities , studio	14	8674	November 28, 2018
How to extract table from pdf Academy Feedback	23	14909	February 24, 2021
Extracting pdf to excel Help excel , pdf , activities , question	5	1300	December 6, 2019

Extract tabular data from PDF

Related topics