Regex for Text Extract from PDF

Hi team,

Wish you very Happy New Year!

From the attached Image taken from a PDF, I am trying to extract the Value of Bill To Cost Centre / Project Code: but I am facing problems using Regex because when I read the PDF to text the lines become like this " Bill To Cost Centre /
Invoices will be raised
1512451256
Project Code
manually and submitted to "

And I need to only extract the number 1512451256 Matching with the string Bill To Cost Centre / Project Code, how to ignore this string "Invoices will be raised "

Thank you!

Regards
Max

1 Like

Fine if the input is in a variable named strinput

Then output will be of type string with a variable in assign activity

strinput = String.Join(β€œβ€,strinput.Split(Environment.NewLine.ToArray()))

stroutput = System.Text.RegularExpressions.Regex.Match(strinput,”(?<=Bill To Cost Centre\s\W).*(?=Project Code)”).ToString

Then a final assign activity like this
stroutput = System.Text.RegularExpressions.Regex.Match(stroutput.ToString,”(\d)+”).ToString

Cheers @mc00476004

1 Like

Thank you very much @Palaniyappan, I understand this, but what if the value is not a digit?

1 Like

Usually bill to cost venter will be digits only right
Or do we have a chance to get alpha numeric value

@mc00476004

Yeah, @Palaniyappan, In one of the pdf it’s alphanumeric, and it can start either with a digit or a Character.

image

1 Like

Can you send your pdf?

1 Like

Whether this number β€œ1512451256” will be of constant length?
If its of constant length then you can use,β€œ\d{10}”

1 Like

Hi Manish, I will send you the PDF separately, and No that value is not constant and it is alphanumeric.

Thank you for your help!

Awesome in that case we can use this expression in common that would get both numeric and alpha numeric at the last assign activity

stroutput = System.Text.RegularExpressions.Regex.Match(stroutput.ToString,”\d+|[0-9A-Z\W]+”).ToString

Cheers @mc00476004

You can use below code,
System.Text.RegularExpressions.Regex.Match(YourString,”\d{10}”).ToString
Check with this.

Thank you @Palaniyappan, This Pattern \d+|[0-9A-Z\W]+ is Matching some other characters as well image

I have modified this , can you please check if the below is okay and if there will be any errors?

image

1 Like

Kindly include this in your expression
[0-9]+|[0-9A-Z\W]+\d+|[0-9A-Z\W]+

Cheers @mc00476004

2 Likes

Kindly let know for any queries or clarification
Cheers @mc00476004

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.