Extracting URLs from website hyperlink

scraping

#1

Looking to extract information from a forum, much like the UI Path forum. I can get a table with topics and metadata, but would like to click into the hyperlink to retrieve contents of each forum and paste it into a worksheet.

Is this possible?


#2

Hi,

Using Extract Data you can also get the hyperlink. Then, you can use NavigateTo activity for each URL extracted.
A similar example can be found here: https://www.uipath.com/examples/web-scraping-structured-data-get-news.


#3

So I am using UI Path forum as an example, I can extract the Topics, Categories, Replies, Views and Activity. But I can get the URLs or href from source code. In the Mashable example you provided the Extractor has XML coding see below

I would like to do the same but my XML skills are not so great, taking the UI Path forum as a example how would you rewrite this ?


#4

If your URLs have some structures like being placed in a table or other tabular format then you can use the Extract Web Data wizard to generate all the required activities (including the XML meta-data).

If you want all the URLs from a web page or all URLs inside a certain container then you should use “Find Children” activity; you’ll find a snippet for this in UiPath Studio Library pane / Snippets / Loops / For Each Child. You’ll need to add Get Attribute to get the href of each anchor object.


#5

there is no snippet option on my version of UI Path, could you send me an example of the sequence you are describing again I am using UI path forum to write this flow. So if you could create a example sequence from https://forum.uipath.com/ that would be helpful.

Thanks


#6

I attached the snippet workflow; you’ll need to update the target for Find Children activity.

snippet.xaml (8.2 KB)