Logics for manipulation result

Iswarya_P1 · July 27, 2023, 11:26am

Victoria Australia
CAMBRIDGE MA 02139
Singapore 637431.
Republic of Korea 05842
SAINT LAURENT QC H4S 1Z9

The above is the separate string data. and I want the expected string as,

Victoria
CAMBRIDGE
Singapore
Republic of Korea
SAINT LAURENT

Cheers!

Anil_G · July 27, 2023, 11:31am

@Iswarya_P1

There is no specific pattern…on what basis you want to split?

where are you getting this data from?is it possible to get any difference between what splits you need…

if reading from pdf try with preserve format

cheers

pravallikapaluri · July 27, 2023, 11:32am

Hi @Iswarya_P1

^(\w+)\s(.*)

Hope it helps!!

vrdabberu · July 27, 2023, 11:34am

@Iswarya_P1

Use the below regex expression

(^[A-Z]+[a-z]+)|((?<=\n)[A-Z]+[a-z]*\s+[A-Z]*[a-z]*\s*[A-Z]*[a-z]*(?=\s*\d+))|((?<=\n)[A-Z]+[a-z]*\s+[A-Z]*[a-z]*)

Hope it works!!

Iswarya_P1 · July 27, 2023, 11:35am

It is extracted from pdf only. I want to map the mentioned information to the column fields.

Anil_G · July 27, 2023, 11:36am

@Iswarya_P1

did you with preserve format? if you are reading the pdf using read pdf text activity…then the separation between different items will be looking different which we can leverage

cheers

Iswarya_P1 · July 27, 2023, 11:40am

I have used the regex format but it’s not working as expected. The below details has been extracted while using the regex expression.

Victoria
CAMBRIDGE MA
Singapore
Republic
SAINT LAURENT

Cheers!

vrdabberu · July 27, 2023, 11:45am

@Iswarya_P1

Try the below regex

(^[A-Z]+[a-z]+)|((?<=\n)[A-Z]+(?=\s+[A-Z]+\s+\d+))|((?<=\n)[A-Z]+[a-z]*\s+[A-Z]*[a-z]*\s*[A-Z]*[a-z]*(?=\s*\d+))|(?<=\n)[A-Z]+\s+[A-Z]+(?=\s+[A-Z]+\s+[A-Z]+\d+)

Output

Hope it works!!

Iswarya_P1 · July 27, 2023, 12:13pm

I have used the regex expression in workflow like,

System.Text.RegularExpressions.Regex.Match(Row(“ShipToAddress4”).ToString,“(^[1]+[a-z]+)|((?<=\n)[A-Z]+(?=\s+[A-Z]+\s+\d+))|((?<=\n)[A-Z]+[a-z]\s+[A-Z][a-z]\s[A-Z][a-z](?=\s*\d+))|(?<=\n)[A-Z]+\s+[A-Z]+(?=\s+[A-Z]+\s+[A-Z]+\d+)”)

I have got the “Republic of Korea” as Republic in the message box.

Plz guide me. Cheers!

A-Z ↩︎

vrdabberu · July 27, 2023, 12:18pm

@Iswarya_P1

You have given match in your syntax so please use the below

System.Text.RegularExpressions.Regex.Matches(input,"(^[A-Z]+[a-z]+)|((?<=\n)[A-Z]+(?=\s+[A-Z]+\s+\d+))|((?<=\n)[A-Z]+[a-z]*\s+[A-Z]*[a-z]*\s*[A-Z]*[a-z]*(?=\s*\d+))|(?<=\n)[A-Z]+\s+[A-Z]+(?=\s+[A-Z]+\s+[A-Z]+\d+)")

Hope it works!!

Iswarya_P1 · July 27, 2023, 12:27pm

I am facing the variable type issue here. What data type should I use in assigning?

Cheers!

vrdabberu · July 27, 2023, 12:32pm

@Iswarya_P1

Please change the data type as

System.Collections.Generic.IEnumerable<System.Text.RegularExpressions.Match>

Hope it works!!

vrdabberu · July 27, 2023, 2:46pm

@Iswarya_P1

If you find the solution for your query please do mark as solution to close the loop.

Happy Automation!!

supermanPunch · July 27, 2023, 7:20pm

Hi @Iswarya_P1 ,

I believe the logic is based on the Names present in the Text, Maybe it is an Address and you would want to retrieve only certain parts from it. But unless a proper understanding of how manually this is being done is verified, I believe the logic is only Human understandable or we would need to introduce AI/ML capabilities to it.

Narrowing the Manual process on how it is done with different data could help us to come to a conclusion.

Topic		Replies	Views
Can help on regex activity Activities uiautomation , pdf-extraction , pdf-to-excel	9	64	December 21, 2024
Regex Based Extractor Help activities , regex , question	5	1420	January 6, 2020
Split a string with no fixed length Help	8	1624	April 2, 2019
Regex Based Information Extraction Test Cloud question , test_cloud	8	370	September 21, 2023
Specific Data from PDF sheet Help	30	1758	September 2, 2019

Logics for manipulation result

Related topics