How to create dynamic XML text in ExtractMetadata from ExtractData activity

Hello!

I am learning UiPath using an ecommerce website.
Now I try to automatically open each product category page (can’t check the links manually, there are about 300 categories) and then scrap some data from those lists and save the data.
Workflow is here: Emag scraping (v5 - solving bug).xaml (36.3 KB)

I used the Data Scrapping Wizard on some product categories webpages randomly chosen and I saw that:

  1. Extract Structured Data usually has in the ExtractMetadata in the XML such texts (in many pages, like https://www.emag.ro/aparate_aer_conditionat/c?tree_ref=312 or https://www.emag.ro/laptopuri/c?tree_ref=2172):
    <webctrl tag='div' class='card' idx='1'/>
    and
    <webctrl tag='a' class='thumbnail-wrapper js-product-url' idx='1'/>

  2. but some pages have in those places in the XML supplemental class names (e.g. https://www.emag.ro/bluze-dama/c?tree_ref=1706):
    <webctrl tag='div' class='card card-fashion' idx='1'/>
    and
    <webctrl tag='a' class='thumbnail-wrapper js-product-url ratio-2by3' idx='1'/>

PROBLEM: I can’t use a generic Extract Structured Data to scrap all category pages, because on those “special” pages the browser just keeps loading the next pages of the product list, until I stop UiPath manually).
How to write that XML so it works on all cases?

I also tried reading about wildcards in XML and in .NET and in Visual Basic, but haven’t yet found something useful (beginner here).
And such pages haven’t help either:

Well,

Data scraping does not work because pattern is not same across the results. You will have to filter the results based on attributes.

To get he title under the find children activity I have filtered the page source using the ‘Get attribute’ activity with the ‘title’ as the attribute.

I have attached a xaml that shows how to use it. Accordingly you can play around with the attributes and get other values such as vouchers offers URL etc.

STEPS:

  1. inspect the element (F12) and click on the object you want to extract text from.
  2. Then expand the divisions and see which attribute has the text/URL you want to scrape/extract. Refer image below:
  3. Then circled in blue you can see ‘href’ and ‘title’ (the file I have uploaded only has the title being extracted as an example)
  4. Keep assigning it to a collection and then copy it to a data table.

Example xaml HERE (6.9 KB)

Open the URL in IE and run the bot when the results load you can see all titles appear in message box :slight_smile:
Cheers…!