Split string get text


#1

Hi I have this STRING and want to extract the number that is in front of the “" . Not sure how many "” is in there. So far I have

variable.Split({"_"}, StringSplitOptions.None)

but then what is an efficient way to get all of the 5-digit numbers in front of “_”??
I am not sure if the actual context looks like this but this is what messagebox show me when I print the string named variable

readPdf1

In this case, I need to get 21591, 21593, 21513

I also tried to use claims.Split({"_"},System.StringSplitOptions.None)(1).Trim.Split(Environment.NewLine(0))(0).Trim

but it returns nothing.
And when I use -1 instead of 0 to get previous number, it is out of range…
claims.Split({"_"},System.StringSplitOptions.None)(1).Trim.Split(Environment.NewLine(0))(-1).Trim

@ClaytonM Hi, seems like you are very advanced in string split manipulation, could you please help?

Thanks!


String splitting advice does not work
#2

I’m a fan of regex for this use case because it only requires the single step to get your final results. The uipath ‘Matches’ activity can be used for this, which returns an ienumerable value which you can use later in your workflow (generally within a foreach activity).

Here’s an example that works based on the data you provided. It will look for either a carriage return (\r) or newline (\n) before and after 1 or more digits. This would pull out 3 matches of 21591, 21593, 21513 in your example.

(?<=\r|\n)(\d+)(?=\r|\n)

Note that I included both \r and \n as I can’t tell if your data is in newline or carriage return. You should only need to use one or the other


#3

Hi Dave,

Thank you for your rapid reply! I am very new to regex, could you give a bit more explanation about using that expression? There are more lines that contain 1+ digits in my example above… so I am not sure how this expression can help. For example, the line with 0030 S10120 or 0020… they all have digits in the line… but I only want 21591, 21593, 21513.

For Matches -> RegexOption, what should I choose?

Thanks


#4

I made it work using

claims.Split({"_"}, System.StringSplitOptions.RemoveEmptyEntries)(1).Split({Environment.NewLine}, System.StringSplitOptions.None)(claimNumsNew.Count-2)

I guess my next question is in what scenario do we need to use Trim as in example above while I actually don’t need to use it… Thanks!


#5

(?<=\r|\n) looks for a carriage return or new line immediately preceding any number of digits (\d+). Those digits must also be immediately preceding a carriage return or new line (?=\r|\n)

So it is using these a positive lookahead combined with a positive lookbehind to grab the digits. That is why the expression I gave wouldn’t return anything for the line with 0030 S10120 or 0020. It knows there are digits there, but doesn’t match them because they have whitespace or letters before/after them instead of a new line both before and after.


#6

Got it! Thanks a lot Dave. I will try it out.


#7

@lavint, yes @Dave’s suggestions are very good.

Regex is very helpful.

Looks like you got it working but it might be helpful to use .Last in a select.

You simply just need to split by _ with .RemoveEmptyEntries, and that will form an array, then just select the last line of each split.

 claims.Split({"_"},System.StringSplitOptions.RemoveEmptyEntries).Select(Function(line) line.Trim.Split({System.Environment.Newline},System.StringSplitOptions.RemoveEmptyEntries).Last ).ToArray

Now all your numbers are in an array. So you can run that through a For each or join them with String.Join(",",array)

For Regex, once you have a pattern string that works to pull only the 5 digit number, you can create your array like this:

System.Text.RegularExpressions.Regex.Matches( claims, pattern )

But finding a good pattern isn’t always easy, and if your text is consistent enough then it’s not as needed.

Thought I would go ahead and make a late reply.

Regards.


#8

Thanks a lot Clayton! I like to learn different methods and find the best approach :smiley: