Regex without duplicate in UiPath

Hello guys i need to print value from pdf without duplicate
here are the output screenshot


thanks

Hi
Hope this step could help you resolve this
— if your matches activity output variable is named as Out_matches, then we can convert it to a list and then we can get the distinct value like this
Out_list_string = Out_matches.AsQueryable.select(function(x) x.value).ToList)).Distinct

Where this Out_list_string can be passed as input to the for each loop

Cheers @Aluneth_X

2 Likes

my matches activity output variable name is invoice, i create new variable Out_list_string and assign to invoice.AsQueryable.select(function(x) x.value).ToList)).Distinct() but there is some error can u help pls

thanks
image

1 Like

Yah near to the ToList it’s mentione smile ToList))
It should be like ToList()
Cheers @Aluneth_X

i did, but there is new error is the variable type correct?
Cannot assign from type ‘System.Collections.Generic.IEnumerable`1[System.String]’ to type ‘System.String’ in Assign activity ‘Assign’.

We can directly put that left side value of assign activity in the for each loop
Kindly don’t assign it to a string variable
Cheers @Aluneth_X

Hi palaniyppan,
I have same kind of issue. is there any way we can get without looking ? Below is my OCR text

Restoration Coder: Mr. Patel 203.75 203.75
Attn: Mr. Rod Smallwood (WO# 21541416)
e/o Sod Repairs (Sinkage, Immediate Response)
74 Stuart St. (StOLItTville)
Restoration Coder: Mr. Vivek roy 267.11 267.11
Attn: Mr. Rod Smallwood (WO# 24021301)
cx’o Sod Repairs (Removal ofspoils? Topsoila‘Seeding)
75 George St (Aurora)
Restoration Coder: Mr. arin Patel 251.85 251.85
Attn: Mr. Rod Smallwood (WO# 23512606)
010 Interlocking Brick Repairs
143 Spruce St (Aurora)

I need to return three values which is
203.75
267.11
251.85

Since the above values are repeated by using ((\d+.\d+)) expression it is returning 6 values which are actually duplicate. any hint on these how to get unique values from regex ?

Hello

Check out this Regex 101 link for the pattern:
(\d+\.\d+)\s\d+\.\d+
This pattern will work as long as you have two number values both with decimals in it separated by a space.


image
image

Group 1 results:
INSERTVARIABLE(0).Groups(1).ToString

From the Matches Activity, use a write line activity (or an assign activity) and update the capital letters above with the Result from the Matches Activity.

OR

You could try this pattern Regex pattern (Regex101.com link)
(?<=Restoration Coder: )([a-zA-Z.\s]+)(\d+\.\d+)

This pattern is reliable as long as its always on the “Restoration Coder” line and has a decimal place in it.
You would need to get Match 2 for this pattern.

How to get Group 1 results:
INSERTVARIABLE(0).Groups(1).ToString

How to get Group 2 results:
INSERTVARIABLE(0).Groups(2).ToString

From the Matches Activity, use a write line activity (or an assign activity) and update the capital letters above with the Result from the Matches Activity.

image
image

Hi Steven

Restoration Coder is common for sure but the values can be entirely numeric not only decimals example is 203 and

Hello again

I have made it more robust :slight_smile:

Please let me know what you think.

Try this Regex pattern:
(?<=Restoration Coder: )([a-zA-Z.\s]+)([\d.,]+)
It will match any digits + “.” + “,” and stop at a space(" ").
Regex101.com link

image

You will need to get Match 2 :smiley:
How to get Group 2 results:
INSERTVARIABLE(0).Groups(2).ToString

From the Matches Activity, use a write line activity (or an assign activity) and update the capital letters above with the Result from the Matches Activity.

thanks Steve, will test and will update soon.

Update: working perfectly :slight_smile: Thanks Steve :). can you explain the above regex pattern?

Okay great! :smiley:

Please mark as solution :white_check_mark:

The Regex Pattern explanation.

It will look for this text "Restoration Coder: " exactly. Then using the brackets it will match any letters(a-zA-Z) or Spaces (\s) or “.” (this is the green highlighted section) and only stop at the first digit (\d) of 0-9.
The red highlighted section is looking for any digits of 0-9 (\d) and “.” and commas “,” and will stop at the first whitespace/space (\s).

I hope this helps.

Check out my Regex MegaPost to learn more.

I couldn’t find Mark as a solution :frowning:

@Aluneth_X: can you mark as solution :slight_smile:

I just realised you are not the Original Poster :laughing: so you can’t mark as solution.

No further action required.

@Steven_McKeering Thanks Steve

1 Like

note (0) reperesents index of matched string

@Steven_McKeering; can you please help me in having regex for below

Restoration Coder: Mr. Vivek Patel 488.90 488.90
Attn: Mr. Matt Lomano (WO# 24107788)
0le Asphalt Repairs (Traffic ControltRemoval ofSpoilsfiSawcutting)
80 W est. Beaver Creek (Richmond Hill)
Restoration Coder: Mr. Vivek Patel 196.76 196.76
Attn: Mr. Matt. Lomano (WO# 23407489)
cfo Sod Repairs
270 Avro Rd (Maple)
Restoration Coder: Mr. Vivelc Patel 282.60 28260
Attn: Mr. Matt Lomano (W03? 23713545)
cfo Sod Repairs (Removal of Spoilszopsoil/Seeding)

out put should be as below which will be number on line with attn: at the start
24107788
23407489
23713545

Hi @vboddu

You could simply use the pattern:
\d{8}
Regex101.com link
This pattern will be reliable as long as it’s always 8 digits long.

OR try this pattern:
(?<=()(.*)(\d{8})
And get Group 2 for the result.
Regex101.com link

Yes but if another 8 digit exists in any line i don’t need that. I need only 8 digit beside attn