Extract tabular data from PDF

I need to extract structured data from a pdf and store it in excel. Normal data scraping and screen scraping don’t work.

When I use read pdf activity i get the unsorted data, but i dont know how to manipulate it.

I tried to use anchor base activity but i get only one record. I need to extract the following columns and their data

Name

License #

Action

Violation

Effective Date

Can you share the PDF file? That would help to test around to help you. If your company allow it of course.

hi @dorai123

Can you retry ‘Read PDF Text’ activity with its property ‘Preserve Formatting’ set to ‘True’?

The resultant string looks structured and can be manipulated using ‘Text to Columns’ feature in excel.Forum

Hi,

Thanks, this preserves the table structure and gets the data in tabular format but how do i extract only specific Columns required like i mentioned in my post from this and store it in Excel.

Hi,

Sorry unfortunately I cannot share the pdf, that is why provided a snapshot of the pdf content, it contains the data in the format shown in the screenshot

Read PDF activity returns a string which needs additional processing to fetch the desired data. On the contrary, data scraping activity results in a datatable which can be manipulated easily.

So can you share the error you encountered while using ‘data scraping’ activity?

Because when i use ‘data scraping’ activity using sample data, its working fine.

if you can copy and share only the table data in a separate pdf file, it will be of great help to understand it better.

Thanks

Hi,

Sorry for the late reply, after changing options in Acrobat I am able to retrieve the data using data scraping like you said.

Thanks

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.