Regex doesn't work with a certain type of invoice

Hello everybody!

I have the next problem.

I’m extracting de reference number of a document using a Regex Expression, but it only works with one type of invoice.

For example, we have 2 types of invoices: 917 and 918.

In the 917 I have the following (this is only a little part of the invoice):

917

I use the following regex to find the reference number:

FactRect = system.text.RegularExpressions.regex.Match(pdf,“(?<=^Factura )\d+”, RegexOptions.Multiline).Value.ToString

And FactRect is a String Variable.

When I use this, it works perfectly and I get the number 6800016187. But the 918 invoices change a bit. After the word “Factura”, appears the following: “de Rectificación”. It seems to be easy changing only the word “Factura” for “Rectificación” on the regex expression, but it doesn’t work.

I’ve tried the following:

FactRect = system.text.RegularExpressions.regex.Match(pdf,“(?<=^Rectificación )\d+”, RegexOptions.Multiline).Value.ToString

FactRect = system.text.RegularExpressions.regex.Match(pdf,“(?<=^Rectificacion )\d+”, RegexOptions.Multiline).Value.ToString

FactRect = system.text.RegularExpressions.regex.Match(pdf,“(?<=^Rectificación)\d+”, RegexOptions.Multiline).Value.ToString

FactRect = system.text.RegularExpressions.regex.Match(pdf,“(?<=^n )\d+”, RegexOptions.Multiline).Value.ToString

And it doesn’t works (it doesn’t give me an error, it returns an empty space). Also I’ve tried to change the variable to match instead of string but it doesn’t work either. It’s strange because this part of the code is in a conditional with the following and it seems that is well evaluated:

FactRect = system.text.RegularExpressions.regex.Match(pdf,“(?<=^Rectificacion )\d+”, RegexOptions.Multiline).Value.ToString

I mean, I’m on that part of the program, cause in another way i’ll get redirected to the “Else” condition.

But I don’t understand why I get a white space with all the options I’ve tried. I want to get the 9170020363 number with a regular expression but I’m newcomer with the regular expressions.

Thanks in advance.

Hi,

Can you try the following?

FactRect = system.text.RegularExpressions.regex.Match(pdf,"(?<=^.*?Rectificacion\s*)\d+", RegexOptions.Multiline).Value

Or

FactRect = system.text.RegularExpressions.regex.Match(pdf,"(?<=Rectificacion\s*)\d+", RegexOptions.Multiline).Value

Regards,

Hi @informatica1 ,

If you still do face problems, could you provide us with the extracted Text of the PDF’s in a text file? Both of the PDF formats would be required to understand the difference. This would maybe help us to find a common regex pattern for both of the Invoice Types.

Hi Yoichi,

It works perfectly! Only I had to change “Rectificacion” for “Rectificación” in your expression and BINGO!

Thank you very much!!

Regards

1 Like

Thanks supermanPunch, but the Invoices had private information of clients and I can’t provide the text. But it’s solved anyway, thanks also!!

Regards

1 Like