Use Matches to obtain strings of text within a paragraph

Hi,

I would like to pull all triathlon websites from the follow using matches:

href="/triathlon.com/rmstechnology/home">RMS Technology

  • <div class=“j10yRb” role="presentation"

    For example; “/triathlon.com/rmstechnology/home”

    How would I do this? Should I use the advanced option?

@Katie_Vooght

can you check if following options would better fit to the task:

  • using find children, filtering to all a elements (Links) and retrieve the href attribute value
  • using XML processing and filtering to all a elements and href attribute value retrival

About Regex A quick an dirty approach could be:
grafik

(?<=href=)".*" seems to start in the right place but then highlights all text after even text that is not needed. Is there a better way to indicate where to stop the text?

@Katie_Vooght
as it was doing the most simplest regex pattern yes it take also the surrounding “”. But I dont know your RegEx skills and aimed to do as simple as possible.

However it will not disturb as it can easy removed.

  • use the Matches activity and configure Pattern, input output
  • Afterwards run within an Assign
    left side: String() - Urls
    right side: Matches.Select(Function (m) m.ToString.replace(chr(34).toString,"").Trim).toArray

and you will get a string array with All Urls

Another approach would be to work with regex Groups and to refer to the Url sourrounded by "

Try this expression:
<a\s+(?:[^>]*?\s+)?href=(["'])(.*?)\1
And from every match returned you will want to get the group 2.
Match.Groups(1).Value

How can I adapt <?href=(["’])(.*?)\1 to ensure it only picks up links with /triathlon.com/rmstechnology in? At the moment is also picking up links such as:

href=“https://fonts.googleapis.com/css?family=Google+Sans:400,500|Roboto:300,400,500,700|Source+Code+Pro:400,700&display=swap

@Katie_Vooght
maybe this helps:
(?<=href=")(\/triathlon\.com\/rmstechnology\/.*)(")


refer to group 1
grafik

if it is not doing as expected, then please your clear described requirements and sample values with us