Finding a url with regex.matches

padamus7 · April 16, 2018, 2:09pm

Hello,

I need to extract a URL from an email. I’m using the matches function.
There are three links in the e-mail, and I need to extract the middle one:

https://my.pitchbook.com/?asOfDate=2018-03-30&alert=true&pbr=16011274&tag=VIEW_ALL&tagPos=TOP

https://my.pitchbook.com/?asOfDate=2018-03-30&alert=true&searchId=16011274&showSearch=i&tag=MODIFY&tagPos=TOP

https://my.pitchbook.com/?tag=INLINE%3AALL&tagPos=INDIVIDUAL&changeType=DEAL&alert=true&asOfDate=2018-03-30&c=222835-24&searchId=16011274

This is what my regex looks like, but it’s not working.
“/^.\bhttps\b.\bMODIFY\b.*$/”

Any help?

AshwinS2 · April 16, 2018, 2:15pm

hi @padamus7

use this regex expression

/[A-Za-z]+://[A-Za-z0-9-]+.[A-Za-z0-9-:%&;?#/.=]+/g

Thanks
Ashwin S

padamus7 · April 16, 2018, 2:17pm

Thanks for the reply, but this finds all 3 links, but I only need to extract the 2nd one.

Dave · April 16, 2018, 2:19pm

Use varRegexMatch(1).tostring to pull out only the 2nd match

padamus7 · April 16, 2018, 2:30pm

That regex still doesn’t find any URL at all.

padamus7 · April 16, 2018, 2:37pm

To be more specific, the regex does find the links in a regix editor like RegExr, but it doesnt find it in my workflow.

Dave · April 16, 2018, 2:57pm

Make sure you’re using .net regex, the syntax is slightly different than others.

If your email will always contain 2+ URLs and you always want the second one, then I’d recommend grabbing all URLs with regex and then using the second match in your workflow.

However, if you always will want the first link that contains the word MODIFY, then something similar to your initial expression would be fine.

arivu96 · April 16, 2018, 2:59pm

HI @padamus7,

Try this pattern
((http[s]?):\/)?\/?([^:\/\s]+)((\/\w+)*\/)\S*

Regards,
Arivu

padamus7 · April 16, 2018, 3:13pm

Using @arivu96 syntax i managed to extrac this link: PitchBook

Not sure how UIPath found those numbers or .gif.
Maybe it’s because I’m not using .net regex?
How do I switch to it @Dave ave ?

Dave · April 16, 2018, 3:29pm

arivu96’s solution is grabbing everything starting with http and grabbing all non-whitespace characters in between.

The link you just said was extracted wasn’t even found above in the sample provided?

If your links don’t have spaces in them, a simpler solution to grab all URLs could just be (http)\S*

This is looking for the words http, then grabs all non-whitespace characters. However, if the URL shows spaces instead of %20, then it won’t work and you should use arivu96’s solution instead which seems to be doing some more fancy stuff I don’t feel like parsing right now haha

padamus7 · April 16, 2018, 3:54pm

Thank you everyone for helping out. Turns out there was another link that I missed, so I just had to change matchesfound(1) to (3).

I ended up using this regex: (http)\S* as the e-mails are always the same and the number of links are the same.
@Dave

padamus7 · April 20, 2018, 7:27pm

@Dave I decided to optimize my workflow to not find all links and open the 3rd one, as that’s not very reliable if number of links change.
I need to open a specific link containg the phrase ‘view_all’
I came up with this regex, which works in regexr.com:
^.\b(https|view_all)\b.$
Unfortunately, in UiPath, it opens a bunch of tabs with single words such as ‘want’ or ‘to’, and then it finally opens the link, but with extra characters:
%3Chttps//my.pitchbook.com/?asOfDate=2018-04-19&alert=true&searchId=16129717&showSearch=i&tag=MODIFY&tagPos=TOP
If anyone could help I would greatly appreciate it, I’ve been struggling with this for a good few days.

padamus7 · April 20, 2018, 8:04pm

I’ve also tried this regex

^(?=.?\bhttps\b)(?=.?\bview_all\b).*$

but it produces the same result

Topic		Replies	Views
Extract URL from email body using Regex command Activities activities , question , other	11	1476	January 11, 2023
Extracting a URL from email body Help uiautomation	1	1250	August 4, 2020
Extracting required URL from mail body using Regex Studio mail , question , exchange_server	5	1686	May 10, 2021
How to Seperate URL from the mail body? Iam trying to get the body message using item.body, later trying to split the data using regex but can't get the answer...can anybody any suggestions like how to proceed. Thankyou Help activities , question , data_manipulation	3	1004	April 13, 2020
Hypertext link email Help	4	800	July 9, 2019

Finding a url with regex.matches

Related topics