I am learning UiPath using an ecommerce website.
Now I try to automatically open each product category page (can’t check the links manually, there are about 300 categories) and then scrap some data from those lists and save the data.
Workflow is here: Emag scraping (v5 - solving bug).xaml (36.3 KB)
I used the Data Scrapping Wizard on some product categories webpages randomly chosen and I saw that:
Extract Structured Data usually has in the ExtractMetadata in the XML such texts (in many pages, like https://www.emag.ro/aparate_aer_conditionat/c?tree_ref=312 or https://www.emag.ro/laptopuri/c?tree_ref=2172):
<webctrl tag='div' class='card' idx='1'/>
<webctrl tag='a' class='thumbnail-wrapper js-product-url' idx='1'/>
but some pages have in those places in the XML supplemental class names (e.g. https://www.emag.ro/bluze-dama/c?tree_ref=1706):
<webctrl tag='div' class='card card-fashion' idx='1'/>
<webctrl tag='a' class='thumbnail-wrapper js-product-url ratio-2by3' idx='1'/>
PROBLEM: I can’t use a generic Extract Structured Data to scrap all category pages, because on those “special” pages the browser just keeps loading the next pages of the product list, until I stop UiPath manually).
How to write that XML so it works on all cases?
I also tried reading about wildcards in XML and in .NET and in Visual Basic, but haven’t yet found something useful (beginner here).
And such pages haven’t help either: