Best way to read a complex organized pdf file

Zahid_Rahim · July 18, 2023, 6:46pm

Hi experts,

I am a newbie here.

I have a pdf file with more than 50 attributes and a dynamic table with variable rows.

I want to read all attribute values and table values from rows and columns.

I have seen many YouTube videos and information but what is the best way/resource to read an organized but complex long pdf to read?

Regards,

Zahid Rahim

Vikas_M · July 18, 2023, 6:50pm

hey @Zahid_Rahim

Use the “Extract Structured Data” Activity:

If the table in the PDF has a consistent structure (fixed column headers and rows), you can use the “Extract Structured Data” activity from the UiPath.Excel.Activities package to extract the table data into a DataTable.
This activity uses OCR to locate table elements, so it is essential to have a structured and consistent layout.

Handle Dynamic Tables:

If the table has a variable number of rows and columns, you may need to loop through the table elements using activities like “For Each” to extract the values dynamically.

Zahid_Rahim · July 19, 2023, 7:24am

Hey Vikas,
Thank you for the response:

Is there any URL or YouTube video which explain this with steps:

I have fixed Columns and variable rows:
And the data is very much unpredictable and duplicates with columns also separated with single space:

Data Table is some thing like:
Name Description Size
Zahid Rahim Zahid Rahim 19
First Last First Last 20

How can I distinguish columns? As Name: Zahid Rahim and Description is also: Zahid Rahim and the space between these two column is only one space

Regards,

Zahid Rahim

Vikas_M · July 19, 2023, 7:26am

U can refer to these links

Zahid_Rahim · July 19, 2023, 7:59am

Hi Vikas,

Both of the videos are not reading table from pdf…

Regards,

Zahid Rahim

Vikas_M · July 19, 2023, 8:00am

Hey sorry it was a video to extract table using cv

if u need to Get table from pdf then refer

Vikas_M · July 19, 2023, 8:03am

In addition u can refer this below thread

Zahid_Rahim · July 19, 2023, 9:11am

Hi Vikas,
Thank you for all the information. But the scenarios covered by CristianNegulescu are all mainly on hardcoded data or where the data is predictable. But in my case I have different data in ever pdf and there is no only a single space between columns.

Regards,

Zahid Rahim

Topic		Replies	Views
Exctract table data from complicated pdf structure Studio studio	2	818	February 7, 2022
Unstructred extract table data from pdf Studio datatable , pdf , activities , error	1	828	April 12, 2020
Read unstructured PDF files containing tables Help	7	2360	July 12, 2019
Simulate Excel > Data tab > Get Data > From PDF Activities excel , pdf-extraction , extract-structured-data , pdf-to-excel , machine-readable	2	106	July 11, 2024
Extract table data from a pdf to Excel Studio	3	1583	April 17, 2024

Best way to read a complex organized pdf file

Related topics