Logics for manipulation result

Victoria Australia
CAMBRIDGE MA 02139
Singapore 637431.
Republic of Korea 05842
SAINT LAURENT QC H4S 1Z9

The above is the separate string data. and I want the expected string as,

Victoria
CAMBRIDGE
Singapore
Republic of Korea
SAINT LAURENT

Cheers!

@Iswarya_P1

There is no specific pattern…on what basis you want to split?

where are you getting this data from?is it possible to get any difference between what splits you need…

if reading from pdf try with preserve format

cheers

Hi @Iswarya_P1

^(\w+)\s(.*)

Hope it helps!!

@Iswarya_P1

Use the below regex expression

(^[A-Z]+[a-z]+)|((?<=\n)[A-Z]+[a-z]*\s+[A-Z]*[a-z]*\s*[A-Z]*[a-z]*(?=\s*\d+))|((?<=\n)[A-Z]+[a-z]*\s+[A-Z]*[a-z]*)

Hope it works!!

It is extracted from pdf only. I want to map the mentioned information to the column fields.

@Iswarya_P1

did you with preserve format? if you are reading the pdf using read pdf text activity…then the separation between different items will be looking different which we can leverage

cheers

I have used the regex format but it’s not working as expected. The below details has been extracted while using the regex expression.

Victoria
CAMBRIDGE MA
Singapore
Republic
SAINT LAURENT

Cheers!

@Iswarya_P1

Try the below regex

(^[A-Z]+[a-z]+)|((?<=\n)[A-Z]+(?=\s+[A-Z]+\s+\d+))|((?<=\n)[A-Z]+[a-z]*\s+[A-Z]*[a-z]*\s*[A-Z]*[a-z]*(?=\s*\d+))|(?<=\n)[A-Z]+\s+[A-Z]+(?=\s+[A-Z]+\s+[A-Z]+\d+)

Output

Hope it works!!

I have used the regex expression in workflow like,

System.Text.RegularExpressions.Regex.Match(Row(“ShipToAddress4”).ToString,“([1]+[a-z]+)|((?<=\n)[A-Z]+(?=\s+[A-Z]+\s+\d+))|((?<=\n)[A-Z]+[a-z]\s+[A-Z][a-z]\s[A-Z][a-z](?=\s*\d+))|(?<=\n)[A-Z]+\s+[A-Z]+(?=\s+[A-Z]+\s+[A-Z]+\d+)”)

I have got the “Republic of Korea” as Republic in the message box.

Plz guide me. Cheers!


  1. A-Z ↩︎

@Iswarya_P1

You have given match in your syntax so please use the below

System.Text.RegularExpressions.Regex.Matches(input,"(^[A-Z]+[a-z]+)|((?<=\n)[A-Z]+(?=\s+[A-Z]+\s+\d+))|((?<=\n)[A-Z]+[a-z]*\s+[A-Z]*[a-z]*\s*[A-Z]*[a-z]*(?=\s*\d+))|(?<=\n)[A-Z]+\s+[A-Z]+(?=\s+[A-Z]+\s+[A-Z]+\d+)")

Hope it works!!

I am facing the variable type issue here. What data type should I use in assigning?

Cheers!

@Iswarya_P1

Please change the data type as

System.Collections.Generic.IEnumerable<System.Text.RegularExpressions.Match>

Hope it works!!

@Iswarya_P1

If you find the solution for your query please do mark as solution to close the loop.

Happy Automation!!

Hi @Iswarya_P1 ,

I believe the logic is based on the Names present in the Text, Maybe it is an Address and you would want to retrieve only certain parts from it. But unless a proper understanding of how manually this is being done is verified, I believe the logic is only Human understandable or we would need to introduce AI/ML capabilities to it.

Narrowing the Manual process on how it is done with different data could help us to come to a conclusion.