Regex Equation Help

Hi,

I have to extract a number from pdf
eg: Organisation Number:123445.
I have to extract the number 123445.I am using “Organisation Number” as reference.But the problem is the number can be of format 123,445 or 123:445 or 12,3:44. And also the reference Organisation Number format also can change like Organisation,Number or Organisation.Number.

In some scenarios this refernce word is not present,in such cases its throwing me an error.If its not present i have to do certain set of actions.How to do that?

How to solve this?

Hi @amruta_George

You can try with regex expression

System.Text.RegularExpressions.Regex.Match(YourString,"(?<=Organisation\sNumber\W)\S+").Tostring

Regards
Gokul

Output

image

This will work if the reference “Organisation Number” remains like this no.But that also will change like “Organisation,Number” or “Organisation.Number” or “Organisation:Number” .

Hi @amruta_George

System.Text.RegularExpressions.Regex.Match(YourString,"(?<=Organisation\WNumber\W)\S+").Tostring

Regards
Gokul

@amruta_George
Working with groups may can help:
grafik
We turned on caseinsentive option and refert to groups


((Organisation.?)?(Number.?)?(.*?))\b([\d,.:]+)\b

Immediate panel prototyping:
grafik
grafik

Hello @amruta_George
Try this regex Expression

System.Text.RegularExpressions.Regex.Match(YourString,"(?<=Organisation\WNumber\W)[\d\W]+").Tostring

image

Hi @amruta_George

You can also try this

"Organisation Number:123445".Substring("Organisation Number:123445".IndexOf("Organisation Number:")+"Organisation Number:".Length).Split(Environment.NewLine.ToCharArray)(0)

Instead of this “Organisation Number:123445” use your str variable

Note: While Readin the PDF make sure to update the property of Preserve Formatting to “True”

Regards
Sudharsan

This will work if there is mandatory comma or space or anything…Sometimes it can also come as “OrganisationNumber”. Then this will not work.

(?<=Organisation\W?Number\W)[\d\W]+

image

Hi @amruta_George

If possible just share the input format

System.Text.RegularExpressions.Regex.Match(YourString,"(?<=Organisation\WNumber\W)\S+|(?<=OrganisationNumber\W)\S+").Tostring

Regards
Gokul

@amruta_George ,
Try this
(?<=Organisation\WNumber\W)[\d\W]+|(?<=OrganisationNumber\W)[\d\W]+

System.Text.RegularExpressions.Regex.Match(YourString,"(?<=Organisation\WNumber\W)[\d\W]+|(?<=OrganisationNumber\W)[\d\W]+").ToString.Trim

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.