How to get an specific value from an String through Regex?

Hello guys,

I have String which is (KilometreSI I\r\n* 2 1 7 1 5 2 1 4 6 2 it Report Date) and i want to get that 10 digit number from this string using Regex even if there is sapace or not.
i.e. (2171521462/ 2 1 7 1 5 2 1 4 6 2).

Thanks

Just to make sure others can assist, can we get a sample of both scenarios?:

Step 1: Get the barcode
In the meantime. Try this piece of Regex:

(?<=KilometreSI I\r\n*\s)(.*)(\sit) - EDIT: its not show two "" for some reason.
Check out the Regex 101 Link

image

Then you will need to assign group 1 to your variable (MatchBarcode).

Step 2: Remove the white spaces
Remove the spaces from variable (MatchBarcode) by using a ‘Replace’ Activity.

Select white space characters (or “\s”) from within the Replace activity.
Input will be the variable (MatchBarcode)
Output can be the same variable (Match Barcode) or a new one.

1 Like

Hello @mitradev_das,

you can give a try on this pattern as well (?<=\s).[0-9\s]+

image

Cheers,
@mitradev_das

2 Likes

Thanks for your replay bro. but it’s not working.

Bellow is my String :

“El\r\n_I ERNST&YOUNC Expense Report\r\nI Date Prepared: 30-05-2019 I Managing Country: XE I\r\nI Submitted: 03-06-2019 (Processed) I Business Unit: XE024 I\r\nI GPN — Name: XE020M01198 - Gulshan Bhandari I Management Unit: 00837 Sub Management Unit: 0921107 I\r\nI Signature: I Approved By: Rank: I\r\nI Rank: Associate Director I Approval Signature: I\r\nI Expense Details (Attach supporting receipts) I Total Net IC = Chargeable P = Authorized I\r\nI Loc I Expense Type I Date I Description I Expense Expense I Type I Engagement I Activity I\r\nOTHER Conveyance-Non-Billable - Others - Taxi 22-05-2019 Meetings with EMEIA Conflicts Leader, Laura. 584.68 584.68 P 24251492 0000\r\nCab from 4/4, Inder Colony, Sector 31, Faridabad\r\nHaryana 121003, India to UNIT NO 2B ,\r\nAMENITY BLOCK-II,CANDOR TECHSPACE,\r\nSector 21, Gurugram, Haryana 122016, India\r\nOTHER Conveyance-Non-Billable - Others - Taxi 24-05-2019 Meetings with EMEIA Conflicts Leader, Laura . 566.75 566.75 P 24251492 0000\r\nCab from 2/4, Old Sher Shah Suri Rd, Inder\r\nColony, Sector 31, Faridabad, Haryana 121003,\r\nIndia to 5, Sector 21, Gurugram, Haryana 12201t\r\nIndia\r\nI|III|I “III”“I” |IIIH|II| IHII “III“I”" I|III IN"" “N [III TOtalSI 1,151.43 I 1,151.43l Tom] KilometreSI I\r\n* 2 1 7 1 5 2 1 4 6 2 it Report Date Format: dd-mm-yyyy\r\nPage 1 of 1 11-06-2020"

Hello, please try the following

pattern = "(?<=KilometreSI I\\r\\n\*)(\s*\d)+"
result = System.Text.RegularExpressions.Regex.Match(myText, pattern).Value.Replace(" ", "")
1 Like

Hello @msan
Tanks bro. i am able to select the barcode in regex . But i am not able to get the value in UIPath Studio. Any suggestion bro.

Example : CastIterator { }

with myText as your sample text as String:

Assign (String)
pattern = "(?<=KilometreSI I\\r\\n\*)(\s*\d)+"

Assign (String)
result = System.Text.RegularExpressions.Regex.Match(myText, pattern).Value.Replace(" ", "")

2 Likes

Hi @mitradev_das

I have adapted the Regex from @Pradeep_Shiv :slight_smile:

Have a look at this Main.xaml (10.1 KB)

As you can see this workflow will grab the Barcode with or without white spaces and then replace all white spaces with no space (“”).
image

Regex101 Link for your review
Regex solution:
image

So a team effort for this solution from @Pradeep_Shiv and me :smiley:

Hopefully this solves your problem.

2 Likes

@Steven_McKeering made some changes in the pattern, @mitradev_das this works fine i guess

Cheers
Happy Learning!

2 Likes

Thanks @Steven_McKeering and @Pradeep_Shiv i appreciate both of your help but i don’t know why it’s not working for me. Please find the attached some sample pdf files and my workflow.Extract_PDF_Test-xaml.zip (4.3 KB) Files.zip (170.9 KB)

1 Like

@mitradev_das

OK, now I can see the actual strings, here is a revision, you won’t need any Regex activity. Get the text from your document and use the string (here named myText) with the following activities.

Assign (String)
pattern = "KilometreSI\s+I\s+\*(?<code>[\s\d]+)"

Assign (String)
result = System.Text.RegularExpressions.Regex.Match(myText, pattern).Groups("code").ToString.Replace(" ", "")

or as a one-liner if you prefer you’ll have the search string from

System.Text.RegularExpressions.Regex.Match(myText, "KilometreSI\s+I\s+\*(?<code>[\s\d]+)").Groups("code").ToString.Replace(" ", "")

1 Like

@mitradev_das image

Which Activity is this??

Hello @mitradev_das,

I’ve made few changes in your code, and it’s working fine for me!
Please feel free to ask

Cheers
@mitradev_das
MitraDev_Das.zip (193.3 KB)

1 Like

Please make sure you understand that when you see that text in write line or log, those chars \r\n might not be actually that and just a Environment.NewLine and so your regex might be wrong using those chars as is…

2 Likes

Exactly @bcorrea
He is right! @mitradev_das
it actually looks like this.

1 Like

@bcorrea I ecountered a file format were the actual characters were mandatory at each line end (and with the windows CR/LF too :slight_smile: ) I’m not suprised anymore.

Thanks @Pradeep_Shiv it’s working fine now. Thanks again for your help…

1 Like

One more thing i want to get from those pdf files and that is the GPN number. Here the problem is kind of same as Barcode. In some pdf files there is space in between
(- & Name {GPN — Name: XE020M01198}) and sometime there is no space like this
(GPN —Name: XE020M01198). So, could you please help me bro.

Thanks

1 Like

Sure @mitradev_das,

You can use this pattern (?<=Name:).*(?=-)

Cheers

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.