Matches a string from a txt and then separate in rows and collums

Hello Community!
I am new to UiPath and I am still learning things.
I am currently working on extracting data from pdf by reading as txt and then matching the info I need.
However, one of the info I extracted is a set of dates and numbers by delimitating the range.
When I transfer to excel it all comes as a txt and only on cell.
I need it as a table.

This is the match activity for this part.
01/02/23 P 0.0 34,500.0 0.0
02/06/23 P 46,500.0 81,000.0 46,500.0
03/06/23 P 16,500.0 97,500.0 16,500.0
04/03/23 P 18,000.0 115,500.0 18,000.0
05/01/23 P 34,500.0 150,000.0 34,500.0
06/05/23 P 18,000.0 168,000.0 18,000.0
07/03/23 P 16,500.0 184,500.0 16,500.0
08/07/23 P 18,000.0 202,500.0 18,000.0
09/04/23 P 0.0 202,500.0 0.0
10/02/23 P 0.0 202,500.0 0.0
11/06/23 P 0.0 202,500.0 0.0
12/04/23 P 0.0 202,500.0 0.0
01/01/24 P 0.0 202,500.0 0.0

Regex range delimiter I am using: (?<=Monthly )(.\n)(?=Fab)

I need the matches and also separate them into rows and columns before transferring them to excel.

use Generate Data Table from Text activity you get the the data in datatable format later you can write to excel

Regards

Thanks for your reply!

I tried that but the txt files extracted from the pdf there are more info.
I should delimitate only this info I need before using Generate Data Table from Text activity.
But I am not sure how to do that.
I have already set the string with Matches.

image

if that is the case can you share the pdf file and required output

Ok. I selected the info I need to extract to excel in yellow.
Teste.pdf (47.5 KB)

Please find attached file

PDFtoExcelTable.xaml (19.5 KB)




System.Text.RegularExpressions.Regex.match(PDFText,"Release ID:\s(?<ReleaseID>\d+).*Release Date:\s(?<ReleaseDate>\d{2}\/\d{2}\/\d{2})[\s\S]+Item Number:\s(?<ItemNumber>[^\s+]+)[\S\s]+Receipt Date:\s(?<ReceiptDate>\d{2}\/\d{2}\/\d{2})[\S\s]+Receipt Quantity:\s(?<ReceiptQty>.+)[\S\s]+Cum Received:\s(?<CumReceived>.+)[\S\s]+Packing Slip\/Shipper:\s(?<PackingSS>.*)[\S\s]+(?<=Monthly)(?<TableData>[\S\s]+)(?=Fab)")

to get all highlighted data from file into groups

m.Groups("ReleaseID").Value

Like above get different groups

Finally generate data table to get the table

Note here assumption is the values will only present for 5 columns others columns will always be empty

if not the case then let me know it requires more activity likes again extracting data of the table with regex

Regards

Hello, Thanks so much for the help.
Unfortunately, I can not open the file on my StudioX due to compatibility.
Is there a way to save it so then I can open it on StudioX?

Is it possible to send me the flow in a different format so then I can build it myself?
I am not sure if the screenshots are complete.

Did you try to open the file separately or from project already created

Try opening separately
in mean time will look for any other method

PdfExtractionDT.zip (43.0 KB)

Check this

1.Use ReadPDF Activity - Give OutPut type Text Var (PDFText)


2.Use Assign Activity - in to field give m datatype System.Text.RegularExpressions.Match and in Value Field

System.Text.RegularExpressions.Regex.match(PDFText,"Release ID:\s(?<ReleaseID>\d+).*Release Date:\s(?<ReleaseDate>\d{2}\/\d{2}\/\d{2})[\s\S]+Item Number:\s(?<ItemNumber>[^\s+]+)[\S\s]+Receipt Date:\s(?<ReceiptDate>\d{2}\/\d{2}\/\d{2})[\S\s]+Receipt Quantity:\s(?<ReceiptQty>.+)[\S\s]+Cum Received:\s(?<CumReceived>.+)[\S\s]+Packing Slip\/Shipper:\s(?<PackingSS>.*)[\S\s]+(?<=Monthly)(?<TableData>[\S\s]+)(?=Fab)")

image
3.This is optional - Use If Activity - Condition m.value.Any and then side use multiassign to side provide the Var names as you required , Value Side m.Groups(“ReleaseID”).Value (So here “ReleaseID” is group name like you have provide 8 for group names you can refer regex for ex (?\d+) releaseID match same way you can get from regex
on else if you require you can use log activity


4.use Generate Datatable from text activity to get table give input from multiassign as i have given “TableData” and Output Type Data Table (OutDT) if required you can change
In Options give as per ScreenShot
image

5. This is also optional , i have used multi assign activity to change the column names based on pdf file data , if you also want to change then in to field give Outdt.columns(“Column1”).ColumnName and value field give as you reuire as it is having 5 fields you have add five times in multiassign

6.This is also optional if you require write excel file giving the path and datatable name
Hope this helps to build
If you still face any issue let me know

Hello Again!

I had to reinstall my studioX because I was having problems when running the process regarding not having permission to use document understanding.
Now I can not find the field to change the datatype for the variable “m”.

Do you know why these 2 problems happen?

change the datatype of var m from string to “System.Text.RegularExpressions.Match”

in variable panel select the variable type and in drop down you have option of browse for type


you will get one window there you have select

Once you select have to select the highlighted one

Regards

Ok! Thank you!

I am getting the following error now when running.
Do you know why?

document understanding is not required if you have added for only this case then uninstall, you only require UiPath.pdf.Activites

to read pdf file

Hello Dear Lakshmi,
I am almost done with the project.
I need your help once again.

I have filtered my columns and I could place the data in excel.
However I need now to make each row of my data be transformed in one row. See below.

I will have to process a set of files and then write the info extracted in each row. That’s why I need to make the delivery dates and quantities in a row not as a list. Can you help me with that too?

Once you get the filtered datatable ,

use assign activity in save to side give some VarName of type string (ex.Str)
and in value side

String.Join(";", (From r In InDt.AsEnumerable()
                   Let ContStr = r(0).ToString() + ";" + r(1).ToString()
                   Select ContStr).ToArray())

Next Use Generate DataTable Activity
Provide the above result string from assign Activity as input and give Datatable type Var

Also configure as shown below

Once done use append range Activity and provide file name sheet name
image

Hope this helps , and one more thing if you are replying better reply to same way i do so that i will get notification

Nice!! It worked well!!

Now the last thing I need is to run a loop to several files with the same structure.
Then I need to write the data from each file in a sequence of rows.
Like Files 1 write date A1, B1, C1, D1 …
File 2 write date A2, B2, C2, D2 …
.
.
.
I did it using the write cells activity. I am not sure if there is an easier way to do it.

Can you help me with this last thing?

See my screen shots.





OutputData.xlsx (10.1 KB)

sorry i didnot get you what exactly you requirement is , can you elaborate with some example

in the file i dont see any thing related

Let me know if you have lot of invoices to process , if you want to have all invoices data in single sheet except the table,

then you can initially Build data table with columns required
image

later in the loop you can use Add Data Row
image

example i have given

Regards

Yep!
I need now to process several files with the same structure.

I want to place every set of data(Each file) in a sequence of Excel row.
Now we managed to extract the data from one file and place it in one row.
Then I need to process automatically the other files in a folder and place the data in the next row in excel. See example below.

i got now

try this way

Just add the varables what you require to the assign activity where you are getting from datatable check below


for sample i have given 2 variables , you can any number of var required but in same format

Hope this is what your expecting

Tks again Lakshmi!
I got you.

One last thing.

Now how can I after reading each file in a folder place sequentially each file date in a Excel row?
I think here I will need a loop in a folder to read each file and then place the data I set in the variable above in each row.

I am not sure how to do that.