Regex to check if filename format was followed and extract number

Hello Everyone,
I need to check if a string followed a specific format for a filename and then extract the 12 digit number from it if it followed the format.

Filename format:
12digitNo_A.pdf/jpg
12digitNo_b.pdf/jpg
12digitNo_C.pdf/jpg

Sample Input:
000111222333_a.jpg
0001112222333_B.pdf
111222333444_c.jpg (1)

Expected Output:
\True 000111222333
\False
\True 111222333444

I have tried (?i)\b(\d{12})\s*_(A|B|C).(.*)$ but it returns true if my string was 000111222333_ab.jpg
Only one letter from A,B,C should appear after underscore and before period.

Thank you in advance.

Hi @mnlatam

I think I have a solution for you. Check out this pattern

Regex Pattern: “\b(\d{12})_\w\.(jpg|pdf)”

Explanation - it will match on:
12 digits only
followed by an underscore “_”
then a single word character (not just a,b or c)
then a “.”
Then the file formats are in brackets, the “|” symbol essentially means either/or. So it can be both jpg or pdf.

To get the 12 digit number you need to use group 1.

To get Group 1:
INSERT_REGEX_OUTPUT_VARIABLE(0).Groups(1).ToString
And update/replace capital letters.

Hopefully this helps :slight_smile:

Checkout my Regex MegaPost if you want to learn Regex

Cheers

Steve

2 Likes

Good day @Steven_McKeering

Thanks for the quick help.
For no.3 in your explanation. Only one character from A,B,C is possible after the “_”. I’ve tried (A|B|C){1} but it doesn’t work.

Hey

Happy to assist where I can.

The reason it is not matching is because you are using capital letters. Your sample has lower case.
“A” is not the same as “a” - unless you are ignoring the case. I would check that you have the “IgnoreCase” option on. Are you using the Matches activity or an Assign?

I would also consider removing the “{1}”. It’s not really necessary because you are already looking for exactly one letter (A or B or C).

Im using Assign for this one to extract the number. It worked when I enabled RegexOptions.IgnoreCase from the parameters.

Got it bud, I thought using this will allow only one letter from the condition. :smiley:

Cheers

1 Like

Great :slight_smile:

All sorted and working?

Let me know if you need anything further.

Follow up question. Let’s say I have an array as input. How can I extract the number for each filename in the array? I’m trying to use in_Attachments.Split(";"c).AsEnumerable().Where(Function (e) System.Text.RegularExpressions.Regex.Matches(e,in_AccountNumberPattern).Count<>0).ToArray() . But I need to include “Groups(1)” to return the account numbers only

Input Array: {“000111222333_a.jpg” , “111222333444_c.jpg (1)”}
Output: {“000111222333” , “111222333444”}

Within UiPath Studio, try using a “for each” activity to handle each item from the array input individually.

Does that help?

1 Like