Matches - Pattern help

Short · October 18, 2018, 11:21am

Hi all

I’m reading a PDF document to find a number that could be 6, 8 or 9 characters long. The way I’ve done it is it’ll use Regex to find if there’s a number 8 characters long, if not it’ll look for a 9 digit one, then 6:

The result kept coming up as blank so I used write lines to see which path it was following. Evidently, it’s thinking there’s an 8 character match but then the result it write back is blank. In the PDF, there’s a 9 character match, not an 8.

Am I doing something wrong? I don’t understand why, when there’s a 9 character match, it’s thinking there’s an 8 one but then the result is blank?

Thanks!

loginerror · October 18, 2018, 11:42am

Hi @Short

If you have a 9 digit string present, then your first Regex to find your 8 digits will return first 8 digits of that 9 digit string.
That in turn causes your IF condition to evaluate as False, because there are matches found.
This ends up with your IF going to Else.

I hope it helps

Short · October 18, 2018, 11:49am

Hi @loginerror

I thought that might have been the case, thank you!

How would I get around this? As it’s an IEnumberable veriable, what would I put as the IF condition?

Thanks

loginerror · October 18, 2018, 11:52am

I have a question first, because it can all be simplified if the number occurs only once.
For example, this will find either 6 digits, 8 digits or 9 digits and it will return it as a single match:

System.Text.RegularExpressions.Regex.Match(strPDFText, "\d{9}|\d{8}|\d{6}").ToString

See example here:

As you can see, you can also access the single first Match output by simply using .ToString

Short · October 18, 2018, 11:58am

Ah ok, that’s very useful to know, thank you!

A match might not always be found though, as sometimes the PDF won’t have the correct number on there so I’d still need an IF statement thrown in there.

Also, the match (if in the PDF) will always be formatted like this - XXX/XX00/NUMBERHERE/0000000 - would there be a way to narrow down the search so it looks for the number in between /'s?

loginerror · October 18, 2018, 12:03pm

Sure, see updated example:
https://regex101.com/r/guT9v3/3
I updated the Regex a bit

There is no issue with an IF statement, you could use this in an IF statement:

System.Text.RegularExpressions.Regex.Match(strPDFText, "(?<=\/)\d{6,9}").ToString = ""

Short · October 18, 2018, 1:01pm

You are amazing, thank you!

Short · October 18, 2018, 1:05pm

One tiny thing though that I cannot work out, XXX/XX00/NUMBERHERE/123456 - the number it’s looking for is “NUMBERHERE”, how would I get it to look between / and / rather than just one?

Sorry to be a pain!

loginerror · October 18, 2018, 1:07pm

Not at all, my bad for misreading the requirement
This regex will work:
https://regex101.com/r/guT9v3/4

(?<=\/)\d{6,9}(?=\/)

Topic		Replies	Views
Regex finding six digits Studio studio , question , activities_panel	8	5025	June 18, 2022
PDF Extraction using regex expression Studio	4	250	January 8, 2024
Regex assistance Help pdf , data_scraping , regex , question	8	1259	November 11, 2019
Specific Data from PDF sheet Help	30	1757	September 2, 2019
RegEx match not appearing in my output Studio studio , question , output_panel	8	1193	September 21, 2022

Most Active Users - Yesterday
ashokkarale
sonaliaggarwal47
anjasing
mkankatala
Anil_G
manasrlenka25
A_Learner
SorenB
balaraman.ramiya
sharazkm32
More details...

Matches - Pattern help

Related topics