How to save a inconsistent string in different variables?

Hello all,

Need some help. My input Strings with different scenarios

(1)The University of Texas at Austin – The Red McCombs School of Business,2004 – 2009,Bachelor of Arts in Business Administration.

(2)Korea University,University of South Korea,2004,Bachelor of Arts in Business Administration,Master of Science

(3)Korea University,2004 – 2009,Bachelor of Arts in Business Administration,Master of Science

Now I am trying to split these strings into three parts (i.e., 3 variables) as University, Study_Period &Degree_Name.

( University:The University of Texas at Austin – The Red McCombs School of Business),
(Study_Period:2004 – 2009),(Degree_Name:Bachelor of Arts in Business Administration) like that.

I tried two ways: Split with separator “,” and “–” But for some cases its not working.
We can’t predict the , or – .

Because of string inconsistency i am unable to split them please suggest me.

Finally, my requirement was string should split into 3 parts i.e., year,year before string & year after string.

Thanks…

Have you considered using Regex so you can split by dash or comma?

Assuming I get this right, it would look like this:
System.Text.RegularExpressions.Regex.Split(str, "\,|\-")

It will let you match either dash or comma and split by it.

2 Likes

Thanks for the reply,
But it’s not working. I have nearly 1000 records and we can’t predict that which scenario will come ,then how can develop a code for it .

Is there any way to split using integer value like (integer before and after)

Korea University,University of South Korea,2004,Bachelor of Arts in Business Administration,Master of Science

Thanks…

You’re right. So, we can assume that you have 3 parts and the second part always has a number after the comma and before the comma. From your examples, looks like the parts are always split by a comma - but, your problem is that there is also a comma in the first and third part potentially.

Using a look behind and look ahead, we can match with a pattern where the comma is after a digit OR before a digit. While testing this, I used the following expression shown in the image:

Regex expression is like this:

universityStr = "Korea University,University of South Korea,2004,Bachelor of Arts in Business Administration,Master of Science"
universityPattern = "(,(?=\d))|((?<=\d),)" //this let's you match comma before digit OR after digit
universityInfo = Regex.Split(universityStr, universityPattern, RegexOptions.ExplicitCapture)

For some reason, the ExplicitCapture option was needed or it split wrong. (I am not sure what that option actually does)

This should work for any pattern where the second part is always with a digit after the comma or before the comma, which means if the first or third part have a digit next to the comma, this will not work right. If there are other scenarios that don’t work with this pattern, let us know if you need help figuring out the correct pattern.

Regards.

3 Likes

Thank you,
It’s working.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.