[SOLVED] How to collect many links for all subcategories from a webpage

Hello!

I need to collect all links for the product sub-categories from this page: Vezi toate categoriile de produse eMAG - eMAG.ro

There is some structure on that page, but I haven’t found yet a way to scrap just the links for each separate product sub-category.
On the screenshot below I marked with red the blocks of links that I need to collect.
Basically, it’s all the blocks of links from all product sub-categories (written with small black font) but without those links to the product categories (written with bigger blue font).

Thanks for any help!

This doesn’t help, unfortunately:

Well,

Can you try this and get back if you are still facing issues

Data Scraping ListView - #8 by Raghavendraprasad

Regards :slight_smile:

Thanks for the suggestion, @Raghavendraprasad!

It’s quite a bit outside my beginner level, but I’ll try to use some ideas from that message and see if I can get to something useful.

I really hoped there would be some more beginner-friendly solution to this problem, though :slight_smile:

Well,

here is a link to that .xaml where I have shown how to leverage find children activity - hope this helps.

Regards :slight_smile:

Thanks, @Raghavendraprasad!

Unfortunately, it gives an error of selector not valid.
Please see my workflow attached.

How to make this work for my case?
Thank you for help!

Source: Find Children 'DIV  departments-page'
Message: The selector is not valid
Exception Type: UiPath.Core.InvalidSelectorException
An ExceptionDetail, likely created by IncludeExceptionDetailInFaults=true, whose value is:
UiPath.Core.InvalidSelectorException: The selector is not valid ----> System.InvalidCastException: Unable to cast object of type 'System.Xml.XmlText' to type 'System.Xml.XmlElement'.
   at UiPath.Core.Selector.FromXmlString(String xml)
   at UiPath.Core.Selector..ctor(String theSelector)
   --- End of inner ExceptionDetail stack trace ---
   at UiPath.Core.Activities.TaskAsyncCodeActivity`1.EndExecute(AsyncCodeActivityContext context, IAsyncResult result)
   at System.Activities.AsyncCodeActivity.System.Activities.IAsyncCodeActivity.FinishExecution(AsyncCodeActivityContext context, IAsyncResult result)
   at System.Activities.AsyncCodeActivity.CompleteAsyncCodeActivityData.CompleteAsyncCodeActivityWorkItem.Execute(ActivityExecutor executor, BookmarkManager bookmarkManager)

ScrapingAllSubcategoriesLinks.xaml (8.0 KB)

Well,

Giving me the selector will not help because I will have to look at the page source. If your information is not confidential then I can help out picking which DIV you can iterate through or else it will be an uphill task solving it without context of the page source.

In case you can’t share the info then - Play around and find the main Division that houses the sub divisions you want to extract and loop through it until you get the values.

Wish you the best,

Regards

No problem, @Raghavendraprasad!

The URL is public, it’s an online shop, and the page I’m looking for is this: https://www.emag.ro/all-departments?ref=hdr_mm_14

Thank you!

Okay then.

You can see that I have attached the workflow that would help you get those values. You will have to do some basic string operations to assign those values in an array and eventually to an excel or data table.

Follow through with this similarly in all other sections and it will work.

Hope this solves it buddy, mark the solution which you feel is best and close the question :slight_smile:

RegardsScrapingAllSubcategoriesLinks.xaml (8.9 KB)

PS : I haven’t given you end to end code so you will have to build the part where you launch the website etc., from the landing page you run the bot from tray and as I have given writeline activity it will output all results on the window.

Thank you for your effort, @Raghavendraprasad!

I placed your Find Children and For Each activities into a new Open Browser activity (I use Chrome), but it produced an error:
Cannot find the UI element corresponding to this selector

But I admit I don’t understand what this part of your selector does:
html htmlwindowname='dump'

…so I don’t know how to fix it :frowning:

Which means the problem is not solved yet.

Well,

I built the workflow in IE. SO please switch to internet explorer, and get back to me :slight_smile:

I had already tried using IE, but it produced the same error.

(I’m sorry I forgot to mention that in my previous message :frowning: )

Well than,

it worked for me without any issues, are you changing something before running it?

The landing page must be same as I have indicated before. The url that you gave me that itself was launched in IE.

Can you check again?

OK, I created it again from scratch right now and it worked, @Raghavendraprasad!

First it complained again about the selector not found.
But the second time it worked :slight_smile:

so I indicated the ‘outerHtml’ attribute and I will collect the links using a regexp.

Is there a simpler / faster way to grab those links?

Great…! @mejohnny

Glad that my suggestions worked bud.

Yeah there is always a better and faster or optimal way. javascript works way faster I believe, but find children is also quite fast and as what you are doing doesn’t seem to be repetitive I think we can stick with the current solution for now.

Mark a solution and close this topic :slight_smile:

Regards

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.