SPLIT STRING WHERE lowercase begins and only keep uppercase letters

I’m taking a message from a page that looks similar to this.

FAILED BUNCH OF STUFF NEED TO KNOW THIS and Bunch of other Error related issues that can sometimes be Really long and pointless to the user because we don’t know anything about these Errors that are Being thrown. then it repeats The same message like, three times.

SUCCESS COMPLETED SUCCESS

FAILED LINE_DISCONNECTED_AT_SWITCH FAILFAILEDError Message: ?E Form not foundThe Value of MSG:

So I only need the uppercase letters to be returned to me. I don’t want the lowercase letters. Is there a way to split a string by case? Also the message might end up being different than these, they are not always the same, and I can’t create a test case for each and every error that could possibly occur. So it would only work if it splits it up by case.

Hi.

Using Regex, you can use [A-Z] or [a-z] or [A-Za-z]. So, your case we can use [A-Z].
Then, we need to specify how many characters, so we can use {1,}, which says 1 to any number of characters.

Lastly, there are spaces, so we need to include a space in the brackets, like [A-Z ] or [A-Z\s]

Final pattern will look like this:

rPattern = "[A-Z\s]{1,}"

To extract text, you can do this:

Regex.Match(text, rPattern).Value.Trim

Regards.

1 Like

After I posted this and read it, I realized I was going to be looking at regular expressions, which I feel like… every time I look at them I have to learn them all over again. So I posted one of the exact messages that I get, for reference. There are other uppercase letters throughout that I DON’T want to match also. I really only want the first bit that I have made bold.

[1]{1,} This is what I did to fix it I think…

Just so I know… What does the {1, } Mean.


  1. A-Z\s_+ ↩︎

Understood.

You can include any characters inside the square brackets and it will stop when it gets to a character that isn’t in the brackets, like a lowercase character for example.

However, if there is an upper case, like THIS IS TEXT End here, there could be a mistake with this. I’m not sure how you can solve this scenario though unless you can see the Bold style or something.

It means to find 1 or more characters that match a character in the brackets.
For example,
“[A-Z]{1,2}” would only match “AB” in a string that is “ABC”, but using {1,} will work for any number of characters.

1 Like

Can’t see the bold. I just put that there for reference to what I want extracted from the message. It does take the first letter of the Error Message part because the E is capitalized. I don’t know if there is a way to do away with that. It would be nice if there was. Because this is the message that will be returned to all of the employees in the company so that they can see the status of their work, and I don’t want it to be sloppy if I can help it.

So there is a scenario where there is no space between the last word of the uppercase string and the next word? Is the next word in lowercase always certain words, like always “Error” for example?

So if there is an extra word it can be included in the message, like if it’s Error… Even though the rest of it is in lowercase that can be returned in my message. If there is a space, like… FAILED ERROR_FAILED AT_SWITCH Error… I don’t want that last error to show up in my message.

So if there is a space, I want it to end there. If there is a word attached that has lowercase letters, that can be included in the message. I’d rather have it return the whole word even though part of it is lowercase, and then end where the space is?

Ok, let me try an idea I have that can split it and not include the last word, and I will get back to you in a bit.

1 Like

I’m wondering if something like this works?

Left(Regex.Split(text,"[a-z]")(0), Regex.Split(text,"[a-z]")(0).Length-1)

Basically, just split by the first lowercase character, then take the first item, and take all characters except the last character. This is assuming that the first lowercase character will always be after a space or an uppercase character that starts the word.

That didn’t work on the Regex tester.

How do I invoke a new regex in Uipath. This is what I’ve done.

Dim rx As New System.Text.RegularExpressions.Regex(“[1]{1,}”)
Dim matches As New System.Text.RegularExpressions.MatchCollection
matches = rx.Matches(MessageArray(0))
rx.Replace(MessageArray(0), matches)

I’m getting a really long error. About how regular expressions are not my friend…

System.Text.RegularExpressions.MatchCollection.Friend Sub New(regex As System.Text.RegularExpressions.Regex, input As String, Beginning As Integer, Length As Integer, Startat As Integer)’ is not accesible in this context because it is ‘Friend’. At line 2…

The error keeps going. But I have a feeling I’m totally fudging this all up in the first place. So how do I do this.


  1. A-Z\s_+a-z? ↩︎

It should work, cause I tested it using one of your example text lines. I placed the vb.net code into a Message Box to see what it extracted.

I will reply back to your second problem in a few minutes.

1 Like

I don’t know if I do this with the best approach, but I normally will store the Regex pattern into a string, and since it will be a constant string that you may want to change, I would store this in the Variables pane under the Default value for that variable.

Then, you can use that in a Regex method.

System.Text.RegularExpressions.Match(text, rPattern).Value

which returns a string

If you want to use Matches, you can also use the Matches activity (although, it’s not my preference to be honest). To use this using vb.net, you can store them to a MatchCollection variable or use it directly in a ForEach activity.
image

If you are trying to Replace something, you don’t need to use Matches, because it will Replace everything that matches the pattern anyway.

System.Text.RegularExpressions.Regex.Replace(text, rPattern, "")

That should return back your text with everything that matched the pattern to empty string.

Hopefully, I explained that clear enough.

Let me know if you have further questions and hopefully I can help answer them.

Regards.


  1. A-Z\s_+a-z? ↩︎

1 Like

To use a Regex variable, you create in the Variables Pane first:
image

Then, use the variable with any of the members like .Match().Value, .Split(), .Replace(), etc

Here it is being used in a Message Box using the Split method I presented above.
image

And here was the output:
image

Regards.

EDIT:
Here is fix so if there are no lowercase letters like your second example, it won’t split it.
image

1 Like

Okay, It’s all starting to make sense now. I created a variable in the variable pane for a regex, and I just put the pattern in there. Then I imported that into my invoke code method, and I put this.

StatusMessage = regex.Match(MessageArray(0)).Value

That’s not as good as what yours returned, so how do I use that left() deal you are doing, and what is Left()

Left(), Right(), and Mid() are some vb methods to extract part of a string.
Left() will take the left number of characters in the string.
The syntax looks like this:

Left(string, number)

This will take the left number of characters from string.

The reason I used this in your case was because a lowercase character will always have either 1 uppercase character or 1 space “before it”, therefore I am removing it by taking the Length of the extracted text and subtracting one (ie .Length - 1 )

You don’t really need to use Invoke Code, as most of this can be used directly in activities, such as the Assign activity to store your extracted values into a variable to use throughout your Workflow file.

Regards.

1 Like

Okay, You’re right. Now that it’s all shortened down to one assignment, I’ll use an assign activity for that. Awesome. And I’m still confused about how I’m supposed to use this:

Left(Regex.Split(text,“[a-z]”)(0), Regex.Split(text,“[a-z]”)(0).Length-1)

When I have this:

StatusMessage = regex.Match(MessageArray(0)).Value

I’m just not sure how I’m supposed to do this next step. Should I not use .match?

OH. I think I see what you’re doing. So now we are using split to get rid of the lowercase letters. Minus one char.