Trying to extract columns from unaligned PDF data

Hello everyone, I am new to UiPath and the Studio. I have been working on it for the last few weeks and I read the forum etiquette and everything so I’ll try to formulate that right.

There is a PDF document that I need to extract all data from which I did, I stored it in a string variable and can display it in a message box for exemple.

The thing is that I want to extract the Text headers with the corresponding data column below but I am not sure how to achieve that. I searched many threads in the forum but haven’t found any that does quite what I need.

I attach images for reference. tell me if you have any idea. Anything helps.


Hi @A_E

Your first image is not clear

So overall u need to extract the table from PDF ?

Regards

Nived N :robot:

Happy Automation :relaxed::relaxed::relaxed::relaxed::relaxed:

Hi @A_E … You can try generate data table activity to convert it to the datatable…

If you dont have any sensitive information in that text file, is it possible to share it?

1 Like

Hi,

Though your image in not much clear.
But through table image you can follow these steps:

  1. First split the string based on space present and replace it with “,”.
  2. Save it in a CSV file and then convert it into datatable.

Hope it helps,

Regards,
Sahil

1 Like

Here is the file

https://we.tl/t-gfuYZ5AcFz

What I have to do with it is:

  • extract the table form which I did.

  • Split each column and only retain the first, third and fourth.

  • Add the total amounts together in the third row and substract all the negative number from the big one at the top, row 11/08/2020.

  • Finally I have to get rid of the “P” from the reference column and take all of that, put it in a new data table and print it in a console window

Thank you for your help Sahil.

I actually just uploaded the file, tell me if it helps figuring it out, thanks again.

1st is the date column…i am not sure which is 3rd and 4th in the table? could you please point out?

In the below column, I see 2-3 different splits. In the first row there 1 , 110 and 834,000. All this considered as one value or 1 110 & 834,000 ? confused…

The third and fourth columns are the amounts and the references starting with P5… etc

In the column you showed, it’s the amount columns and those are single values
1 110 834.00 is one million, hundred and ten thousand, eight hundred thirty four dollars etc…

@A_E - I almost close to extract all the information using Regex…but since the first line has spaces for Reference column, it is not extracting that row…i am still thinking…

Hi @A_E … Please find the starter help here…with the help of Regex i was able to extract all the info from the pdf…

RegEx_AE.zip (43.2 KB)

Output:

1 Like

Hi,

Sorry, but the image which i am seeing in your post is stiill too blurr and is not visible.

Regards,
Sahil

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.