Matches - Pattern help

Hi all

I’m reading a PDF document to find a number that could be 6, 8 or 9 characters long. The way I’ve done it is it’ll use Regex to find if there’s a number 8 characters long, if not it’ll look for a 9 digit one, then 6:

The result kept coming up as blank so I used write lines to see which path it was following. Evidently, it’s thinking there’s an 8 character match but then the result it write back is blank. In the PDF, there’s a 9 character match, not an 8.

Am I doing something wrong? I don’t understand why, when there’s a 9 character match, it’s thinking there’s an 8 one but then the result is blank?

Thanks!

1 Like

Hi @Short

If you have a 9 digit string present, then your first Regex to find your 8 digits will return first 8 digits of that 9 digit string.
That in turn causes your IF condition to evaluate as False, because there are matches found.
This ends up with your IF going to Else.

I hope it helps :slight_smile:

Hi @loginerror

I thought that might have been the case, thank you!

How would I get around this? As it’s an IEnumberable veriable, what would I put as the IF condition?

Thanks :slight_smile:

I have a question first, because it can all be simplified if the number occurs only once.
For example, this will find either 6 digits, 8 digits or 9 digits and it will return it as a single match:

System.Text.RegularExpressions.Regex.Match(strPDFText, "\d{9}|\d{8}|\d{6}").ToString

See example here:

As you can see, you can also access the single first Match output by simply using .ToString

1 Like

Ah ok, that’s very useful to know, thank you!

A match might not always be found though, as sometimes the PDF won’t have the correct number on there so I’d still need an IF statement thrown in there.

Also, the match (if in the PDF) will always be formatted like this - XXX/XX00/NUMBERHERE/0000000 - would there be a way to narrow down the search so it looks for the number in between /'s?

Sure, see updated example:
https://regex101.com/r/guT9v3/3
I updated the Regex a bit :slight_smile:

There is no issue with an IF statement, you could use this in an IF statement:

System.Text.RegularExpressions.Regex.Match(strPDFText, "(?<=\/)\d{6,9}").ToString = ""
2 Likes

You are amazing, thank you!

One tiny thing though that I cannot work out, XXX/XX00/NUMBERHERE/123456 - the number it’s looking for is “NUMBERHERE”, how would I get it to look between / and / rather than just one?

Sorry to be a pain!

Not at all, my bad for misreading the requirement :slight_smile:
This regex will work:
https://regex101.com/r/guT9v3/4

(?<=\/)\d{6,9}(?=\/)

1 Like