Take an url with specific text in it

Hello guys, i have a huge list of urls, but I only need the ones that have a certain word of several words in it. The list of words are not big. Is it possible even to do this?

Hi @Povilas_Jonikas,

You can use .Contains to see whether it contains specific strings:

URL.Contains("this string") Or URL.Contains("that string")

Or you can use Regex to match patterns of words:

1 Like

Hello @Povilas_Jonikas

Does that mean you need to filter the list with some particular data set?

If yes, you can sue the activity Filter Collection.

image

Thanks

Thanks for the response. I misscomunicated there. Basically, I have opened my old site’s sitemap.xml. There are a ton of url’s, but I need urls with specific words in the middle of them. I would love to write a robot, that would take the certain word containing url’s from the list and write them down to excel

Thanks a lot, ill try to look in to it and let you know if this is the thing i’m looking for :slight_smile:

  • read text file acitvity
  • deserialize the XML
  • Process the XML by taking all links & filter the relevant links
  • generate a datatable (many options to do it: Add DataRow, generatee Datatable, LINQ)
  • Write it to excel

Mayb you share with us the sitemap.xml and a sample keyword for the filtering

1 Like

Thanks Peter for the response. I’ll try to explain deeply what I want to create.
“Sitemap” There are a lot of the same sitemap’s url’s which contains the old urls that I need to take. Basically i would love to create a robot, that goes to different sitemap’s pages, checks if the url’s that are in the sitemap has the keywords that I need, if it does, the URL that contains the keyword is copied and written to excel. Example:
Robot comes to the first sitemap page, checks if the URL’s have any of theese keywords;

  1. Błotnik-i-części-montazowe
    2.Glowka-ramy
    3.Klips-mocujacy
    If it does, he takes only the url’s which has the keyword inside of it, writes it to excel and moves on to the different page of the sitemap to repeat the process

As the provided link has some other data we will describe the essential based on this samples. But you can easy adapt to your samples

Variables:
grafik
Prepare Datatable

Setup keywordlist, read xml string, parse xml into a XDocument, define the default Namespace
grafik

Filter the loc elements on the keyword, loop over the result and add it to the datatable

Finally write the datatable to an Excel, when this is needed

arrKeywords = New String(){"mercedes-benz","volvo"}
xnsDefault = xDoc.Root.Name.Namespace
 xeFilteredUrls = xDoc.Root.Descendants(xnsDefault + "loc").Where(Function (x) arrKeywords.Any(Function (k)  x.Value.Contains(k))).toArray

Find starter help here (Legacy)
SitemapXML_ToDataTable.xaml (11.1 KB)
SitemapKategoriaMarkaModelTyp-8.xml (7.8 MB)

1 Like

You are actually a beast Peter, thanks a lot! I’ll test it out, but theres one little problem thats left, what is this activity?
image
I think I dont have the package of this activity installed, so I cant run it. Thanks :slight_smile:

Manage dependencies
grafik

1 Like

We assume that the question was answered. Please check and close the topic by:
Forum FAQ - How to mark a post as a solution - News / Tutorials - UiPath Community Forum

1 Like

Yes, i’m sorry for the late response, got sick and couldn’t get online. Thanks again Peter, your a beast! :slight_smile:

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.