I am learning RPA using UiPath and these days I’m trying to get some data from some online store, and I have a problem.
So far, I can read a category page like https://www.emag.ro/laptopuri/c and can scrap the product names and links to each product page and put them into an Excel.
And then I can open each page in a browser.
PROBLEM: I don’t know how exactly to set the selector(s) in this case.
Right now selectors are as in the image below, but this only gets each time the title of the page with the list of products and not the titles of each product page.
No, I still haven’t found a way to make it work, after a few days of trying
And a studied quite a few tutorials, but they seem to only show very simple cases where the page structure doesn’t change much (maybe some of the page title), so the selectors are easy to get right automatically.
But in the pages I’m trying to get (mentioned in my first message) it just doesn’t work as I thought it should.
I tried indicating some anchor element, or to describe the page structure taken from html (well, the way I understood I should write stuff in the selector editor)… but no luck so far.
For instance, it gives an error “Cannot find the UI element corresponding to this selector: webctrl tag=‘h1’ parentclass=‘col-xs-12 col-sm-9 col-md-10’”
And I really feel bad about this, because everything else seems in UiPath very nice and easy to comprehend.
So I was hoping someone with more experience would show me a working model that, for instance, can read details from the product pages of the first 2 products from that list so I have something to study and learn from (because later on I’ll want to scrap other text and pictures from each product pages as well).
Again, my beginner thought is that the problem is how to make the selectors for various texts and images in the page (I can open and close a browser window with each product page).
If you could build a working little project for me to learn from, it would be so great!
(or at least recommend me somebody else who could do such a thing to help me understand this stuff)
Oh, so I haven’t explained very clearly: I am interested in getting data from each product page (not from the products list page).
Yes, as you’ve already guessed, I can scrap data from the product list pages: product name, product URL, product image URL, and I can also download the product images from that list. I can do that even if the product list spans multiple pages.
And I can open each product page in a new browser window, and later close that window.
Now I want to understand how to get from each product page:
– the product name (it should be possible to get any data from any page, right?);
– the product images (they are in some sort of images carousel);
– the product description;
– the product detailed characteristics.
Currently, my intended workflow is like in the image below.
I tried to use the Attach Browser activity with selector <html app='chrome.exe' title='* - eMAG.ro' /> or with <html app='chrome.exe' />
I try to get the page title from each product page using Get Full Text activity inside the Attach Browser activity (also tried with Get Text activity).
It gives the error “Cannot find the UI element corresponding to this selector” if I try to use the partial selector <webctrl tag='H1' class='page-title' /> or the partial selector <webctrl tag='H1' parentclass='page-header has-subtitle-info' />
With the partial selector <webctrl tag='H1' /> it only grabs the title of the product list page and not the separate title of each product page.
So I’m guessing it’s all about those selectors, but I still don’t know how to set them up.
I extracted the Get Full Text from its Attach Browser activity where it has been automatically placed when using the Web Scraping wizard, and I placed it (with partial selector <webctrl tag='H1' />") into my custom Attach Browser activity (no declared selector) that I had previously manually placed inside my custom Open Browser activity which receives each product page URL as a changing parameter.
So, now the page title (the ‘h1’ tag) is scrapped correctly.
(It seems to me that the original Attach Browser maintained some fixed context referring to the webpage using at declaration time, a context that wasn’t changing when I was modifying the selectors later, and only by moving the Get Full Text outside of the original Attach Browser could my custom partial selectors adapt to each new webpage opened in the browser.)
Now, let’s see how I can extract the rest of the data!
Thanks so much, @carmen!
Once again, a truly help…y helping hand!
And you know, that’s exactly what I just started to do, so your ideas fill me with even more confidence.
Tomorrow is another day (as they once said in a famous movie); but I’ll study your file (I haven’t opened it yet, it’ll be a surprise) and get on to solve the rest of this stuff with so much more enthusiasm than that movie character :))
I’ll let you know what else I learn (or don’t understand), so that it might be of help to others.
And since I now can scrape the textual data from each product details page, I would like to try to get & save in a MSWord document the entire Description section as it is displayed in the browser (including images, alignments, text formatting etc.).
Can this be done (without manually creating a new HTML table in word and getting the text or image from each table cell from the browser, as some other threads on this forum were suggesting)?
Can I insert text and tables into a Word document anywhere I need, or just by using Insert DataTable (inserts tables one after the other with no space between them, in the order I placed the activities in the workflow) and Append Text (which in my code always inserts text after those inserted data-tables).