How to parse a Non-tagged PDF containing tables using UiPath?

Hello Team,

I am looking at doing a POC for extracting table information from a PDF file (non-tagged one). Is this doable in UiPath? If Yes, do we have BKMs on getting it done?

I tried using Read PDF Text from UiPath.Pdf.Activities. Eventhough the data got extracted successfully, I get the entire table data as a single field. Additionally, all the column values are shown mixed. I mean… the 1st line of 1st column shows up… followed by 1st line of 2nd column… followed by 1st line of Nth column…

After all this, i get 2nd line of 1st column… 2nd line of 2nd column… 2nd line of Nth column.

Due to this, I am not able to build the table back for my business needs. Is there a much simpler way to achieve this? May be i am doing things incorrectly?

Also the last column data for 3rd row keeps spanning to the next page. How can we identify and include that as a single row?

I am new to UiPath forum and using 2019.10.4 version. Please help.


Refer the sample pdf here ABC Report.pdf (74.3 KB)

Refer the output parsed heretemp.txt (6.4 KB)

Use the screen scraping wizard instead. This will but the data in a datatable, which will let you keep your table alignment in the document.

Thanks @Anthony_Humphries. Can you share me a sample automation to do this? I tried using the screen scraping wizard which provides the output in a string variable but not datatable.

Is there a way to convert the string to datatable?

Use the data scraping wizard rather than the screen scraping one. The screen scraping wizard will provide you with a datatable in the end.

I referenced the wrong wizard before.

I tried the data scraping wizard and it returns me an error as “This control does not support data extraction. Please select a table cell”

Click on one of the cells in the table (e.g. choose ABC-1).

Thanks @Anthony_Humphries. I am unable to select a individual cell. Instead the entire document panel gets selected; and I receive the error again.

In this video, I extract tables from PDF and write data in Excel:

0:25 Install PDF Activities
1:10 READ PDF text, Get PDF page count, Extract PDF
5:40 Read PDF with OCR
6:55 Join PDF and Manage PDF passwords
9:30 Extract Images From PDF and Export PDF as Image
12:00 Extract table from PDF use-cases 1 replace some spaces with | (one column has multiple words)
24:00 Run the robot to see the result
25:40 Extract Table from other PDF use-cases 2 delimiter is 2*spaces " " easy split
31:50 Extract Table from complex PDF use-cases 3 unstructured data the logic will be based on IsUpper and IsLower
40:25 Extract the price value from PDF

Cristian Negulescu