Finding a url with regex.matches


#1

Hello,

I need to extract a URL from an email. I’m using the matches function.
There are three links in the e-mail, and I need to extract the middle one:

https://my.pitchbook.com/?asOfDate=2018-03-30&alert=true&pbr=16011274&tag=VIEW_ALL&tagPos=TOP

https://my.pitchbook.com/?asOfDate=2018-03-30&alert=true&searchId=16011274&showSearch=i&tag=MODIFY&tagPos=TOP

https://my.pitchbook.com/?tag=INLINE%3AALL&tagPos=INDIVIDUAL&changeType=DEAL&alert=true&asOfDate=2018-03-30&c=222835-24&searchId=16011274

This is what my regex looks like, but it’s not working.
“/^.\bhttps\b.\bMODIFY\b.*$/”

Any help?


#2

hi @padamus7

use this regex expression

/[A-Za-z]+://[A-Za-z0-9-]+.[A-Za-z0-9-:%&;?#/.=]+/g

Thanks
Ashwin S


#3

Thanks for the reply, but this finds all 3 links, but I only need to extract the 2nd one.


#4

Use varRegexMatch(1).tostring to pull out only the 2nd match


#5

That regex still doesn’t find any URL at all.


#6

To be more specific, the regex does find the links in a regix editor like RegExr, but it doesnt find it in my workflow.


#7

Make sure you’re using .net regex, the syntax is slightly different than others.

If your email will always contain 2+ URLs and you always want the second one, then I’d recommend grabbing all URLs with regex and then using the second match in your workflow.

However, if you always will want the first link that contains the word MODIFY, then something similar to your initial expression would be fine.


#8

HI @padamus7,

Try this pattern
((http[s]?):\/)?\/?([^:\/\s]+)((\/\w+)*\/)\S*

Regards,
Arivu


#9

Using @arivu96 syntax i managed to extrac this link: https://my.pitchbook.com/a/16011274.1430296.2018-03-30.gif%3

Not sure how UIPath found those numbers or .gif.
Maybe it’s because I’m not using .net regex?
How do I switch to it @Dave ave ?


#10

arivu96’s solution is grabbing everything starting with http and grabbing all non-whitespace characters in between.

The link you just said was extracted wasn’t even found above in the sample provided?

If your links don’t have spaces in them, a simpler solution to grab all URLs could just be (http)\S*

This is looking for the words http, then grabs all non-whitespace characters. However, if the URL shows spaces instead of %20, then it won’t work and you should use arivu96’s solution instead which seems to be doing some more fancy stuff I don’t feel like parsing right now haha


#11

Thank you everyone for helping out. Turns out there was another link that I missed, so I just had to change matchesfound(1) to (3).

I ended up using this regex: (http)\S* as the e-mails are always the same and the number of links are the same.
@Dave


#12

@Dave I decided to optimize my workflow to not find all links and open the 3rd one, as that’s not very reliable if number of links change.
I need to open a specific link containg the phrase ‘view_all’
I came up with this regex, which works in regexr.com:
^.\b(https|view_all)\b.$
Unfortunately, in UiPath, it opens a bunch of tabs with single words such as ‘want’ or ‘to’, and then it finally opens the link, but with extra characters:
%3Chttps//my.pitchbook.com/?asOfDate=2018-04-19&alert=true&searchId=16129717&showSearch=i&tag=MODIFY&tagPos=TOP
If anyone could help I would greatly appreciate it, I’ve been struggling with this for a good few days.


#13

I’ve also tried this regex

^(?=.?\bhttps\b)(?=.?\bview_all\b).*$

but it produces the same result