REGEX Pattern to extract distinct values


#1

Hi ,
Though there are similar topics regarding REGEX but i did not get the one which i was looking for.
I have a regex pattern “[\w]*/[\d]{2}-[\d]{2}-[\d]{3}-[\w]{4}/[\d]{1}” which works perfectly and extracts the matched string from each pdf.
Problem is that i am not able to get distinct matches and is repeating the duplicate ones, i tried using distinct but it does not work, may be i am not using it at correct place.
Any help would be highly appreciated.


#2

Hi @Faraz_Subhani ,

My guess, based on your description, is that .Distinct() might not work if you apply it to Match or Capture collection (since objects will be different even if they captured the same text in the input string).

One possible option will be to select just the string values from the collection, something like:
match.Captures.Select(Function(m) m.Value).Distinct()
Another is to use custom equality comparer as described here https://msdn.microsoft.com/en-us/library/bb338049.aspx

Does this seem to answer your question? If not, could you please provide more details.


#3

Hi ,
Thanks for your reply,
I think you are correct regarding the different object creation, i tried implementing your logic in the for loop but it was throwing error, may be i am doing wrong. Below is the screen shot of my flow, please can you assist.

image


#4

Hi @Faraz_Subhani

I think you might try to iterate in For Each statement over UWIMatch.Select(Function(m) m.Value).Distinct() instead of just UWIMatch


#5

Thank You Very much, it worked perfectly :slight_smile: