Read pdf file tabular data

hello everyone,

i need to read a table present inside a pdf.

The structure of the table is always the same, and it comes from an excel file print to pdf.

Is it possible to read for example xml? Or is there any way to read the data present inside the column of that table?

Hi @Singh7633 ,

Thanks for reaching out.

  1. You can try reading PDF with OCR.
  2. Create a data Table.
  3. As you mentioned structure of the table is always the same you can split extracted data based on commas or spaces that work perfectly for you from the output of Read PDF with OCR.
  4. using the write range activity to write the data into an Excel file.

Regards,
@Vinit_Kawle

1 Like

I don’t want to read it with ocr, as it is a structured document. Is there no possibility, coming from an excel file, to read an xml with specific tags?

@Singh7633 ,

After reading an XML with specific tags you have to manipulate it using string manipulation anyhow and then you can format data in tabular format.
so rather than converting pdf to Excel you can read with OCR and manipulate data in tabular format.

Regards,
@Vinit_Kawle

1 Like

yes but imagine that the excel table always has the same format and I always have to read the same column. Basically I didn’t say I can’t use ocr. The best approach remains xml and tags?

@Singh7633 ,

yes but initially if you have a PDF then you require OCR to Convert to other formats.
Otherwise, if excel is already present then XML and tags work best.

Regards,
@Vinit_Kawle

1 Like

Hi @Singh7633

You can try this Package and check if it works for your use case

Extract Tables from PDF - RPA Component | UiPath Marketplace | Overview

Cheers