Pdf extraction usig regex expression

Hello All,

I have the input as follows…


I need to extract “HLBU 3422386” string where the no of numeric digits inside this string are always 7.
The starting keyword “Dimensions:” is constant and the text after that always changes.
The ending keyword “Invoice-number” is constant.
Thanks in advance!

Hi @satish_kumar ,

Here is something that I mocked up on my end. As seen here, the string of interest is being identified by the Regular Expression

The Expression presumes that:

“HLBU” and the “number” are separated one of more tabs or white spaces.

Therefore it will work for:

image

image

image

image

Hope this helps.

1 Like

HI,

Can you try the following expression?

System.Text.RegularExpressions.Regex.Match(yourString,"\b[A-Z]+\s+\d{7}\b").Value

Regards,

Hi,
Thanks for reply.
Can you help me if I have to extract the complete data between the two stings
“Dimensions:” and “movement”
Thanks!

Hi,

Can you try the following expression?

System.Text.RegularExpressions.Regex.Match(yourString,"(?<=Dimensions:)[\s\S]*?(?=movement)").Value

Regards,