REGEX Pattern for "200/14-27-961-19W5/0 ",“1Z0/03-25-061-24W5/0”


#1

Hello,

I have multiple PDF attachments which i am downloading from Outlook and reading through all one by one and finding a particular number/Pattern -“100/14-27-061-19W5/0”.
I tried using Substring by taking position of preceding and succeeding words, it works for only a particular set of pdfs having same pattern and containing those succeeding/preceding words but fails for others.
Basically that pattern can be anywhere in pdf and can have dynamic preceding/succeeding words.
So, i though of using Regex, please can you help me with the regex syntax for the above pattern.

I have tried implementing the logic based on discussions from similar topic but it’s not working.

"200/14-27-961-19W5/0 ",“1Z0/03-25-061-24W5/0”
Everything is dynamic in the pattern except that length will be 15 and will have a W as shown.

Thanks in advance.


#2

@Faraz_Subhani use below pattern. It may be helpful.

pattern - “[\w]{3}/[\d]{2}-[\d]{2}-[\d]{3}-[\w]{4}/[\d]{1}”


#3

Hey!
Thanks for your immediate response.
I am trying your solution in matches-Pattern but i don’t understand why i am getting such kind of output for all the pdfs which the bot is reading and trying to find out the matching pattern.
image

May be i am doing some stupidity.


#4

@Faraz_Subhani Did you import System.Text.RegularExpressions in the import panel


#5

Hello,

Thanks for your reply, for which control you want me to import.
image

I did it for the for each loop, i am not getting any import option for Matches.
Also, I did System.Text.RegularExpressions.Match.
I am yet to test as my flow is broken but i think i am getting confused as to from where to import.


#6

@Faraz_Subhani Check below link to import


#7

Thank You,
That was really stupid of me, it was right in front of my eyes but i am still getting the same output in message box for all the pdfs as above snapshot, i even restarted Ui path as well as my system.:-(:pensive:


#8

Hello,
I am stuck with this, I am using “/^[0-9]{3}[/][0-9]{2}[-][0-9]{2}[-][0-9]{2,3}[-][0-9]{2}[W][0-9]{1}[/][0-9]{1}$/” pattern to extract “100/14-27-061-19W5/0”. kind of patterned number and fetch wherever i get this pattern in pdf files in loop but it fails.
I tried hard coding this number and checking whether my pattern works or not and it works but when i am changing the number to “abch100/14-27-061-19W5/0”, it fails.
So , basically how to change the pattern so that it extracts the number even if there is something after or before that(from bulk text which the bot is reading form pdf).


#9

@Faraz_Subhani Try below pattern. It will work.

pattern - “[\w]*/[\d]{2}-[\d]{2}-[\d]{3}-[\w]{4}/[\d]{1}”

Please first list as many as different numbers available in in ur pdf, then try to write general pattern for all things at once.


#10

Hi,
Thanks for your response , i tried your pattern but i don’t know why i am getting this error when i am trying to display the matched value for the pattern in a message box.
image
I think it has to something with the data types, as the matched value for pattern is stored in IEnumerable whereas i am trying to display as string.

I am attaching my work flow, please assist.Pattern.xaml (8.8 KB)


#11

@Faraz_Subhani try below code, i have made some changes.

Pattern.xaml (10.0 KB)


#12

Superb!! That works perfectly.
Thank You for being patient and helping the throughout:-)


#13

@Faraz_Subhani its ok no problem


#14

Hi @Manjuts90

Thanks again for your solution on pattern finding, that is working well but now i am facing some other issue.
My bot is reading one .pdf at a time, and it is extracting the matching string according to the pattern, for instance if a pdf has X that is matching the pattern, it extracts that X and if the pdf has 4 X’s it is extracting 4 times the same string and all four are same(that is correct according to logic) , however if there are duplicates, i want to extract only one i.e the distinct one and not copies, if it is not duplicate , it can extract X as well as Y.
I am not able to implement the distinct feature, it doesn’t seem to work or i am doing something wrong.
Please can you help me with it.


#15

@Manjuts90 Give distinct in for each loop as highlighted in image.

Capture


Match and get distinct
#16

Hi,
I tried the above solution, it still extracts the duplicates also and not the distinct ones:-(


#17

Apart from using distinct, is there any solution which we can apply to the regex pattern itself so that it will extract only the unique ones.I tried one “[\w]*/[\d]{1,2}-[\d]{1,2}-[\d]{2,3}-[\w]{4}/[\d]{1,2}?<!\1[\s\S]\1”, the bold part are the ones which i have added but even that doesn’t seem to work.


#18

Any suggestions please, i am really stuck with this.


#19

@Faraz_Subhani Check this below link


#20

Hi @Manjuts90, I already checked that link, if you see in my previous comment, i tried to suffix *<?!\1[\s\S]\1) after my regex pattern, it doesn’t seem to work.