Victoria Australia
CAMBRIDGE MA 02139
Singapore 637431.
Republic of Korea 05842
SAINT LAURENT QC H4S 1Z9
The above is the separate string data. and I want the expected string as,
Victoria
CAMBRIDGE
Singapore
Republic of Korea
SAINT LAURENT
Cheers!
Anil_G
(Anil Gorthi)
July 27, 2023, 11:31am
2
@Iswarya_P1
There is no specific pattern…on what basis you want to split?
where are you getting this data from?is it possible to get any difference between what splits you need…
if reading from pdf try with preserve format
cheers
vrdabberu
(Varunraj Dabberu)
July 27, 2023, 11:34am
4
@Iswarya_P1
Use the below regex expression
(^[A-Z]+[a-z]+)|((?<=\n)[A-Z]+[a-z]*\s+[A-Z]*[a-z]*\s*[A-Z]*[a-z]*(?=\s*\d+))|((?<=\n)[A-Z]+[a-z]*\s+[A-Z]*[a-z]*)
Hope it works!!
It is extracted from pdf only. I want to map the mentioned information to the column fields.
Anil_G
(Anil Gorthi)
July 27, 2023, 11:36am
6
@Iswarya_P1
did you with preserve format? if you are reading the pdf using read pdf text activity…then the separation between different items will be looking different which we can leverage
cheers
I have used the regex format but it’s not working as expected. The below details has been extracted while using the regex expression.
Victoria
CAMBRIDGE MA
Singapore
Republic
SAINT LAURENT
Cheers!
vrdabberu
(Varunraj Dabberu)
July 27, 2023, 11:45am
8
@Iswarya_P1
Try the below regex
(^[A-Z]+[a-z]+)|((?<=\n)[A-Z]+(?=\s+[A-Z]+\s+\d+))|((?<=\n)[A-Z]+[a-z]*\s+[A-Z]*[a-z]*\s*[A-Z]*[a-z]*(?=\s*\d+))|(?<=\n)[A-Z]+\s+[A-Z]+(?=\s+[A-Z]+\s+[A-Z]+\d+)
Output
Hope it works!!
I have used the regex expression in workflow like,
System.Text.RegularExpressions.Regex.Match(Row(“ShipToAddress4”).ToString,“(+[a-z]+)|((?<=\n)[A-Z]+(?=\s+[A-Z]+\s+\d+))|((?<=\n)[A-Z]+[a-z]\s+[A-Z] [a-z]\s [A-Z][a-z] (?=\s*\d+))|(?<=\n)[A-Z]+\s+[A-Z]+(?=\s+[A-Z]+\s+[A-Z]+\d+)”)
I have got the “Republic of Korea” as Republic in the message box.
Plz guide me. Cheers!
vrdabberu
(Varunraj Dabberu)
July 27, 2023, 12:18pm
10
@Iswarya_P1
You have given match in your syntax so please use the below
System.Text.RegularExpressions.Regex.Matches(input,"(^[A-Z]+[a-z]+)|((?<=\n)[A-Z]+(?=\s+[A-Z]+\s+\d+))|((?<=\n)[A-Z]+[a-z]*\s+[A-Z]*[a-z]*\s*[A-Z]*[a-z]*(?=\s*\d+))|(?<=\n)[A-Z]+\s+[A-Z]+(?=\s+[A-Z]+\s+[A-Z]+\d+)")
Hope it works!!
I am facing the variable type issue here. What data type should I use in assigning?
Cheers!
vrdabberu
(Varunraj Dabberu)
July 27, 2023, 12:32pm
12
@Iswarya_P1
Please change the data type as
System.Collections.Generic.IEnumerable<System.Text.RegularExpressions.Match>
Hope it works!!
vrdabberu
(Varunraj Dabberu)
July 27, 2023, 2:46pm
13
@Iswarya_P1
If you find the solution for your query please do mark as solution to close the loop.
Happy Automation!!
Iswarya_P1:
Victoria Australia
CAMBRIDGE MA 02139
Singapore 637431.
Republic of Korea 05842
SAINT LAURENT QC H4S 1Z9
The above is the separate string data. and I want the expected string as,
Victoria
CAMBRIDGE
Singapore
Republic of Korea
SAINT LAURENT
Hi @Iswarya_P1 ,
I believe the logic is based on the Names present in the Text, Maybe it is an Address and you would want to retrieve only certain parts from it. But unless a proper understanding of how manually this is being done is verified, I believe the logic is only Human understandable or we would need to introduce AI/ML capabilities to it.
Narrowing the Manual process on how it is done with different data could help us to come to a conclusion.