RegEx tool confusion

Hi team,

I’m new to using RegEx and was looking for help to understand what I’m doing wrong.

Here’s the Regular Expression I created to extract the information around the description of each item from a PDF. However, I can’t share the actual PDF, so I included the text extracted in a doc.

What I don’t understand is why the information I capture on the site isn’t the same as what UiPath captures. Do I need to follow different rules? Am I facing this problem in the wrong way? Should I do it differently?

I have never created an automation like this, so I’m pretty much out of options in my mind.

New folder.zip (27.7 KB)

Hello

There are always multiple ways to solve the problem.

I have created a pattern and sample workflow which you might find helpful. I have had to take a guess at what you wanted to extract.

Sample, Output and Pattern go a long way to help us create a regex pattern.
Preview the pattern here

This will return each item as an individual regex match.

The sample workflow will convert each results into a datatable for you to continue with your project.
Main - Regex-Mardoza.xaml (15.0 KB)

Hopefully you can find my Regex Megapost helpful :slight_smile:

Cheers

Steve

1 Like

HI,

The pattern should be as the following for example, because there seems multiple white spaces.

[0-9]{1,3}\s+[0-9]{3}-[A-Z]{4}|[0-9]{1,3}\s+[0-9]{3}-[0-9]{4}

And, it’s necessary to modify adding datarow part as the following.

Hope this help you.

Regards,

1 Like

Thank you for your response, @Steven_McKeering
The RegEx you created (+ 1 addition I made) solves the issue of capturing the data.

I tried using your code, but it throws the below error.

I’m not sure how to go around it.

Thank you for your response, @Yoichi.

Your Regular Expression also helped me, and I tried to use your suggestion. I do not understand the last part of the value you assigned for strValue, so I think that I’m doing something wrong because is giving me an error.

First part of my workflow:


Second part:

And this is the error:

Any suggestion on how to proceed?

HI,

Sorry, my expression is localized for Japan. Backslash is displayed as Yen sign(¥) in Japan locale.
Can you try the following expression? (Please replace Yen sign with backslash)

strValue = System.Text.RegularExpressions.Regex.Split(currentItem.Value,"\s+")(1)

Regards,

@Yoichi thank you again for your response.

I changed the Yen sign for a backslash and it worked. However, I’m only getting the values of Group 1. How can I retrieve the values of all the groups in the Regular expression? Not the whole expression in 1 cell but a cell/row for each group, so I can keep working on it to complete my task.

PD: As an idea, I tried something similar to what @Steven_McKeering suggested using a counter and the expression {currentItem(counter).Groups(1).ToString.Trim} in the Add Data Row Activity, but it is giving me this error:

Hi,

In this case, it’s unnecessary to use counter. It will be {currentItem.Groups(1).ToString.Trim}
And it may be better to use Named Group of Regex such as (?<LINENUMBER[0-9]{1,3})\s*(?<PARTNO>[0-9A-Z]{3}\-[0-9A-Z]{4})\s*...... We can get each group like matchVar.Group("PARTNO").Value

Regards,

@Yoichi Thank you very much for your help!

I used the Regular Expression and then ran a For Each activity to add all the lines to my data table. Finally, I wrote it in an Excel file. I’m leaving below the expression I used.

{currentItem.Groups("LINEITEM").Value.ToString.Trim, currentItem.Groups("PN").Value.ToString.Trim,.......}

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.