Assign each RegEx Match to Data Row

Hello,
Within UiPath Studio, I have a folder of PDF files. I created a sequence that reads through each PDF file individually, applies a regex pattern from a built-in data table, then assigns the match to a row within another data table, and a range is written to an excel file.

The issue I am having is only the first match from each pdf document is being assigned to a row within the data table. A single PDF file has many occurrences of each regex pattern, but only the first occurrence is being assigned to the data table. I need all matches that occur within a PDF document to be assigned to a row in the data table.

I have a feeling it has something to do with the assign activity as the TO value is row.item(“Value”) and the VB expression is ienMatch(0).Value, so the first value from the regex match?

Any ideas on this?

If your result has more than one item in its expression, its output is an Ienumerable .

Check this, I believe it will help you.

Main.xaml (6.6 KB)

1 Like

Thank you for your response, Jorge. I opened and reviewed the file you sent. It does appear to do something similar to what I am thinking, but not quite capturing what I need. This provides a message box for each match. In a previous project, I had written something similar using the ‘Write Line’ activity which provided all the matches to a RegEx pattern. What I am needing is something slightly different.

Find attached a screenshot I captured of my current sequence/process.

The intended functionality of this sequence is to read each PDF file within a folder, run each RegEx pattern listed in the RegEx_Ref data table, then assign each match to its own row of an excel worksheet located in another folder. Unfortunately, being that I am new to this, I am currently getting just the first match of each RegEx pattern from each PDF file within the excel worksheet.

I really appreciate your effort on this!

OK,

Within your Foreach row you will need to do one more foreach on the lines found in your Matches activities.
image
If Foreach still doesn’t solve it, take a look at https://www.uipath.com/product/document-understanding, If you use an enterprise environment, you’ll be able to use DU to work your files better, and you’ll also be able to better structure your files PDF standards, in addition to working with Regex as well.

Jorge, thanks again for your response. I did my best attempt at what you had suggested, but I am still getting the same result within my excel worksheet: only the first regex match from each document instead of all regex matches from each document.

For further context, the Matches activity references the RegEx_Ref data table rows, which contains 3 different regex patters for 3 different data elements. (The Pattern field in the properties pane for the Matches activity is: “row.Item(“RegEx”).ToString”, which is a reference to the “RegEx” column of the RegEx_Ref data table). Is the issue that the sequence is only running the regex activity once and stopping after the first pass through the document?

The Matches activity result is ‘ienmatch’, which is the input for the new foreach activity that I have added. Is the assign activity correct?

I appreciate the help on this, I am certain I messed something up here. Here is a my updated sequence with the new ForEach activity added:

replace “ienMatch(0).Value” with “item.Value”.
image

Can you share this project?

1 Like

Thank you again for the suggestion. Unfortunately, I am receiving the following error message when I change the VB expression of the assign activity:

"
Main.xaml: Compiler error(s) encountered processing expression “item.Value”.
Option Strict On disallows late binding.
"

And I do apologize, but I cannot share this file out per our company policy. I can try and be as thorough as possible.

The TypeArgument of this foreach has to be System.Text.RegularExpressions.Match.

Ah, thank you for that. Unfortunately, even after updating the TypeArgument, I am still only getting the first match from each PDF document reflected in the Excel worksheet output. :confused:

Your regular expression needs to be like multiline. if not, it will not bring more than one line.

If you can’t share your project, share the regular expressions you’re using and send an example of text you are trying to read with your Regex…

Thank you for your feedback.
I am still having the same output of just the first regex match.
For reference, here is the video I had originally followed that is very similar to what I would like to accomplish. I added the changes that you had provided to this video’s example. Functionally, this is what I am desiring, but instead of just the first regex match returning, I would like all regex matches returned. PDF | RegEx | Excel UiPath Video

Also, find attached a PDF example of a file I would use with this sequence to extract all regex matches. Here are the regex patterns i am using:

"
(?<=As\sof\s)(.?(?=\s))
(?<=Account:\s)(.
?(?=\s))
(?<=Balances\s\s)[\s\S]*?(?=Total)
"

Please let me know if you have any other questions/concerns. Thank you againEXAMPLE_PDFREPORT.pdf|attachment (58.3 KB)

The file has an error.

EXAMPLE_PDFREPORT.pdf (58.3 KB)

sorry about that! Try this one.

Hey Jorge, were you able to open this file?

@zrobins6 - I looked at your PDF…what exactly you are trying to retrieve from each pdf?

Why i am asking is, I can see multiple matches Ex: “opening ledger balance”, “Opening Available Balance” etc etc…"

Are you trying to extract data only for “MAY’S OYSTER BAR / DEPOSITS” or “TONY’S HARDWARE STORE / DEPOSITS”…???..

Please show me the output(you trying to achieve) in an excel by manually typing , so that we get clear idea about your question?

1 Like

sure thing, please find attached an excel example of the desired output for this pdf example.
EXAMPLE_PDFREPORT.xlsx (8.9 KB)

@zrobins6 - Do you see the problem here? You pdf is very very tricky…Normally an invoice consist data for one Account # …in your case its two…that’s why the regex is not working(i guess)…I don’t think DU also works here, I will try and let you know…

Thank you for your response. I am new to the UiPath, so am not familiar with desirable formats, or what can and cannot work. The PDF I had shared is an example of a bank statement, a bit different than an invoice, but I do understand your point. I was wanting to see if I can read through an entire PDF file (multiple pages at times) to return all regex matches to an excel sheet, as the excel example shows. Thanks again for your response and willingness to help.

@zrobins6… Here is the starter help…

Main.xaml (27.3 KB) PDFReport.xlsx (8.1 KB)

You can see two values have been extracted and written successfully in the sheet.

Please use this and explore for the other fields.

Thank you very much! This seems to be doing what I needed, I will modify as I go. Thanks again!

1 Like