Splitting String at multiple words

Hello, how can I split a string at multiple locations? I dont know the delimiters before. I just know they are numeric (like in the example). What is the best way to split the string correctly? Maybe to iterate through all words and see if they are numeric and then split there? Or maybe with regex?

Example:

string = 176 This is a sentence 180 this is another one, 109 - and this also
list_output = 176 This is a sentence, 180 this is another one, 109 - and this also

If you dont mind using the .net split command directly in an Assign Activity, I suppose you could use the .NET Regex.Split Method

1 Like

Hey

Further to the previous post check out these links.

Check here for Regex.Split

Check here for a regex pattern :wink:

If you still have trouble with a regex pattern let us know :slight_smile:

HI @Pathler

Use matches activity to split your string as you expected

click configure regular expression button and choose regex drop down as “Advanced” and enter value as (\d+\s+[A-Za-z\s-]+)


pass your input string in input tab and create output variable in result tab like Regex_out

RegEx holds all the splited values

1 Like

Thanks but it doesnt solve the problem as it just extracts the Numbers, not the text.

HI @Pathler

Check the below thread it may be helpful for your case

Regards
Sudharsan

Hi @Pathler ,

You can do this by using Regex, creating a regex of only matches Digits and then using split with that regex.

Thanks

Hey @Shikhar_Tandon , I’ve thought of that method. could you show me how this works with this example please?

Hi @Pathler,

Just to clarify, from the sentence, where you want to extract only the digits or only the alphabets?

We can achieve this through Regex Expression.

To extract only the digits present in the string: \d+
To extract only the alphabets present in the string: \D+
To extract both alphabets and digits from the sentence and retrieve them through index:
\d+|\D+

\d - fetches only Digit Values
\D - fetches non Digit Values

Use Matches Activity, and in the Configure Expression

  1. Under Regex select Advanced and use the regex expression (provided above) and based on your requirement kindly, use the appropriate expression.

This is the sample output:
image

For reference, I have attached the sample workflow

DigitNonDigtRegex.zip (2.1 KB)

I hope your issue will be solved if you try this out.

Kindly, mark it as solution if your issue is solved by this method.

Regards,
@90sDeveloper

Hi @Pathler ,

As @Shikhar_Tandon was saying, you could use the System.Text.RegularExpressions.Regex.Split method in order to get your output, let me show you an example:

image

Value:

System.Text.RegularExpressions.Regex.Split(var_InitText,"(\d{3})").ToList

This will split the string every time 3 numbers are found in a row (Regex = \d{3}, and will include them on the list if you want (because of the “()” grouping the expression, creating this output:

image

  • to match exactly your output, you’ll need to join some of the values to get it as you please.

If you don’t want the numbers, remove the () from the expression:

  • Regex - no numbers included: "\d{3}"

image

Hope this is what you need!

2 Likes

Hey @ignasi.peiris, thats almost it. Thanks a lot! im looking for a way to get a list like this:
list = “176 This is a sentence”, “180 this is another”, “109 and this also”

so how to combine number with the string?

Good afternoon @Pathler !

There are numerous ways to achieve what you want, I’ll show you below one of them which is graphic, and a 2nd one which is more efficient:

  • Disclaimer: Option 1 it’s not the only way, neither the best! In my opinion, is the most visual one to understand the steps.

After you’ve found all the substrings based on the 3 digits, lets see the output:
image

  • Here you have this empty value at the start, that can mess up our next steps, so the 1st action will be to remove it.

    • This can be fixed by improving the regex expression on our previous steps, but we’ll stick with this to easily understand the steps.

We will remove the empty value with the following:

var_listOutput = var_listOutput.Where(function (x) Not String.IsNullOrEmpty(x)).Distinct().ToList()

The output here will be the following:
image

As you can see, the empty value has disappeared, and all we have left is to group them, every 2 values, meaning List(0) + List(1), List(2) + List(3)…

We can do this with a Do While + another List, which we will iterate every 2 items:

So every iteration, we’re Creating a new list, with the items grouped every 2 values

image

Option 2 (better and more efficient)

  • Improvement on the Regex function:
system.Text.RegularExpressions.Regex.Split(var_initText,"(.+?(?=\d{3}))").ToList

Combined with the “Remove empty values” Linq from above, you have your output!
image

Both codes attached here:
RegexDemo.zip (59.8 KB)

1 Like

If you don’t want to keep the white space(s) before the digits, you can target them instead:

lst = System.Text.RegularExpressions.Regex.Split(inputText, "\s+(?=\d+)").ToList

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.