Get URL from web page - a missing option in 2024.02 StudioX?

Hi Everyone. I’m looking for help extracting URLs underneath display text in web pages. I’m using the StudioX v2024.02, and after hours of searching the UI Path Forum I can only find references to Data Extraction or Data Table Extraction activities that have wonderful check boxes to enable scraping of URLs. My version of those activities does not have that option. I’m hoping to connect with someone who can help me figure this out. Having someone show me a screen shot of their environment with the missing checkbox does not help…unless they have a magic way to show me how to discover or enable it. Thank you.

@James_Hendergart

Urls are generally in the href of the selector

So use get attribute activity…and indicate the field or element from where you want to extract url and select the attribute as href…that should give you the url

Cheers

Nice to meet you, Anil, and thank you for this suggestion. I will try it later today when I have some development time and report back here. I appreciate your time to help very much.

1 Like

Text doesn’t have a URL. Links do, pages do. What are you trying to do? Give us some details and the web page you’re working with if you can.

Hi Paul, thanks for your time to help. Please tell me the proper word to use to describe readable words in a web page that have a hyperlink in embedded? I apologize for using the phrase “URL underneath display text.” I thought it was pretty clear. For example Google in html would render on a web page the word Google with a hyperlink. I’m trying to extract the href value, not the string “Google.” I hope this helps.

oops, my html was rendered above… i meant to display a typical anchor tag with an href.

Hi Anil. I’m making some progress. I’ve confirmed that the Strict Selector editor gives me access to more code inside my target page, and by opening UI Explorer from there, I have located the URL property and value I wish to extract in the Property Explorer pane. Time to attend some meetings. I will try to select that URL value as my target…I think I’m very close!

1 Like

Those are links. They’ll have an HREF property you can get with Get Attribute, as Anil mentioned.

Why don’t you just click the link? Have you done the free training on the academy.uipath.com web site?

Yes and yes. I am able to click the link and grab the URL from a new tab, but I have 80 links to grab, and the process takes 6 activities each, 6 x 80 = 480 activities. I’m looking for a way to simply extract the URL, just like I’m already doing for the display text and another value in the same table using one activity (Extract Table Data), but my version of UI Path doesn’t support that easy way to do things.

1 Like

You should upgrade, it doesn’t cost anything.

The Extract features can extract the URL along with the link text. Then you’ll have a datatable you can loop through, navigate to the URL and do what you need. This is a good strategy for your use case.

Thanks, Paul. I’ve already checked with my IT department, and they say I am not allowed to have an license for Studio (LoL). Good news, I’ve been able to successfully extract the innerhtml which contains about 3 properties, including the URL. :slight_smile: Wonderful. This is the closest I’ve gotten thus far. I really appreciate your help and Anil’s. I think I’m going to get there. I’ll come back to this thread to mark it solved as soon as I can, including summarizing the things that I learned so that my experience can help others. Now I have to attend some other meetings.

1 Like

FYI if you sign up for a cloud account and community edition Studio, you’ll have the latest version.

I’ve come back to provide an update and to thank both Anil and Paul. My development environment and my corporate constraints have led me down an interesting path. Both of the answers I received from Anil and Paul were accurate and could be marked as solutions, but my exact situation merits a bit more explanation for anyone else in the community with a similar situation as mine, so I’ll elaborate here those details and then mark my question as solved.

Before I start, I’m going to define some terms I will use. First, I use target or href to define the URL in hyperlinked text on my web page (the href or target or URL is visible by hovering with mouse over the hyperlinked text). I use text or display text to refer to the hyperlinked text in my web page.

The core of my question was solved using the Get Attribute activity, but until today, I was running into some other issues. First, I could not get the href attribute at all. I don’t know why it wasn’t available when configuring the activity. So I settled for using the innerhtml attribute which extracted multiple html properties at once. I configured UI Path to write this chunk to a cell in Excel and used an Excel formula to further extract just the URL. This technique was about 80% effective. The reason for this was not UI Path. It was because my Excel formulas were not properly accounting for variable input. My project requires extracting the URLs from 80 links, so a 20% failure rate is unacceptable. I turned off my system and came back on a new day. I think the restart of my PC and restart of UI Path StudioX helped. Suddenly the href attribute was now available for the Get Attribute activity! I no longer need to perform the secondary extraction using Excel formulas. This is a big success. My next challenge is accurately targeting the 80 links because their length changes dynamically across runs. Sometimes the display text is the same for multiple hyperlink UI elements, and UI Path gets confused, not being able to verify the target. I’m solving this by tweaking the configuration settings for targets and anchors. I may repost to this thread as I figure out how to do this reliably. For anyone having issues distinguishing web UI elements whose display text are identical, but screen location and underlying target URL are different, you may find selecting the smallest possible UI element for your target helpful. In my case, the web page UI elements have 2-3 options for each element because the web page is structured with tables (so word, table cell, and table row are all detected UI elements by UI Path).

Thank you, Everyone!

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.