Regex expression to split string based on capital letter but with additional conditions

Hello,

I want to split my input string based on capital letter but with 2 exceptions. Exception1: Dont split the caps letter if letter is preceded by space or " ’ "(Euris/d’Investissements). Exception 2.:Dont split if letter is immediately succeeded by another Caps or “.” (S.A.S / SARIS)
This is my sample input text :
“Fonciere EurisCarpinienne De ParticipationEuris Cie Europeenne d’InvestissementsMiramont Finance et Distribution SASociété EurismaSociété SARIS S.A.S.”

My output elements are

  1. Fonciere Euris
  2. Carpinienne De Participation
  3. Euris Cie Europeenne d’Investissements
  4. Miramont Finance et Distribution
  5. SASociété Eurisma
  6. Société SARIS S.A.S.

Hi, @Priyanka_Sharma1!

Judging by the rules you presented, I would say that you need to do a split every time you have a lowercase letter followed by an uppercase one.
Fonciere EurisCarpinienne De ParticipationEuris Cie Europeenne d’InvestissementsMiramont Finance et DistributionSASociété EurismaSociété SARIS S.A.S.

I used RegEx ([a-z][A-Z] for English, [a-z,àâäèéêëîïôœùûüÿç][A-Z,ÀÂÄÈÉÊËÎÏÔŒÙÛÜŸÇ] for French) to find these occurrences. Next, I am adding a separator character between these two letters (e.g. pipe character). Last step is to split the string by the separator, after which I am left with the individual tokens in an array of strings.

image

Hope this helps!

1 Like

Hi,

We can write it as the following expression. (It’s same as @andreeaghetu 's approach)

System.Text.RegularExpressions.Regex.Split(yourString,"(?<=[a-z])(?=[A-Z])")

Main.xaml (6.5 KB)

Regards,

2 Likes