How to remove duplicates from OCR Output

Hello guys,

I am in a bit of a tough spot. I’ve trying for hours to find a solution to my problem but I failed.
I have an automation which reads the pages of a scanned document using Omnipage engine.
It outputs the result to a variable. I am using a regular expression to extract values that match a certain type.

My Regex output is something like this:
1234567890,1234567890,111222334,111222334,111222334 etc. (IEnumerable of matches)
I am trying to get all the distinct/unique values so my final output would look like 1234567890,111222334.
Regex for unique values is not a solution, I have already tried.

Please help me in finding a solution to this issue.
Thank you!

After trial and error I found out how to solve this issue:
You can remove duplicates from a list by using MyListName.Distinct().ToList().

So I used a for each loop to take each ienumerable of match result (the regex output) and added them to a list. I then used Distinct.ToList to remove the duplicates.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.