Best way to read a complex organized pdf file

Hi experts,

I am a newbie here.

I have a pdf file with more than 50 attributes and a dynamic table with variable rows.

I want to read all attribute values and table values from rows and columns.

I have seen many YouTube videos and information but what is the best way/resource to read an organized but complex long pdf to read?

Regards,

Zahid Rahim

hey @Zahid_Rahim

  1. Use the “Extract Structured Data” Activity:
  • If the table in the PDF has a consistent structure (fixed column headers and rows), you can use the “Extract Structured Data” activity from the UiPath.Excel.Activities package to extract the table data into a DataTable.
  • This activity uses OCR to locate table elements, so it is essential to have a structured and consistent layout.
  1. Handle Dynamic Tables:
  • If the table has a variable number of rows and columns, you may need to loop through the table elements using activities like “For Each” to extract the values dynamically.

Hey Vikas,
Thank you for the response:

Is there any URL or YouTube video which explain this with steps:

I have fixed Columns and variable rows:
And the data is very much unpredictable and duplicates with columns also separated with single space:

Data Table is some thing like:
Name Description Size
Zahid Rahim Zahid Rahim 19
First Last First Last 20

How can I distinguish columns? As Name: Zahid Rahim and Description is also: Zahid Rahim and the space between these two column is only one space

Regards,

Zahid Rahim

U can refer to these links

Hi Vikas,

Both of the videos are not reading table from pdf…

Regards,

Zahid Rahim

Hey sorry it was a video to extract table using cv

if u need to Get table from pdf then refer

In addition u can refer this below thread

Hi Vikas,
Thank you for all the information. But the scenarios covered by CristianNegulescu are all mainly on hardcoded data or where the data is predictable. But in my case I have different data in ever pdf and there is no only a single space between columns.

Regards,

Zahid Rahim