Data scrapping with selectors

Hi guys. So im trying to learn how the selectors work, so that when I use data scrapping my ui would be more accurate, but I’ve ran into a problem where I dont know what to do.
https://allegro.pl/uzytkownik/jgd-parts (THE URL THAT I NEED TO SCRAPPE)

So I need to get the code that is next to the “Numer katalogowy części” text but it is not always in the same place. I try to use Extract table data, because I need to scrappe all the pages (In total 23)



This is how the uipath file looks like:
image

Thanks in advance guys!

Hello @Povilas_Jonikas

Kindly refer this thread, It may helps you

1 Like

Hi @Gokul_Jayakumar, and thanks for the response, but I’ve already watched theese videos. You see, my main problem is that my data scrapping activity cant find the pattern, because there are 4 almost the same elements there. Im trying to write a selector that would only take the info I need, but im having trouble with that

@Povilas_Jonikas , It is may be bug
I can get pattern

Try this

  1. Uninstall uipath extensions in chrome
  2. Un install chrome extension iin uipath tool
  3. Again install extension in uipath tool, Chrome will close and extension will install freshly
  4. Try the Data scraping process again
1 Like

You can’t get just that one bit of text, because it’s not its own separate object. You get the entire text it’s in - which is probably a SPAN or DIV - and then extract just the part you want using RegEx.

2 Likes

Thanks for the response guys, I appreciate that a lot.

You see guys, im trying to extract it like this: But the main problem is that im getting an error that the elements are almost the same. Is there any way to go around this?

Hi Paul and thanks for the response. It looks a bit like this:


Is it possible to get an insight on how RegEx works?

Hi @Povilas_Jonikas

Do you need to get this value?

Regards
Gokul

1 Like

Sorry for the late response. Yes, I would love to get only this result, but It just doesn’t work for me

Hi @Povilas_Jonikas

How about the XAML file?

DataScrapping_Allegro.xaml (10.2 KB)

image

Output

Numer.xlsx (14.1 KB)

Regards
Gokul

@Povilas_Jonikas
Are you uninstall and reinstall the UiPath extension in chrome?

Is it possible to get the text that is near the sentance “Numer katalogowy części” only?

We can do this using Data Scrapping @Povilas_Jonikas

Can you share more details with screen shots

So there is this page that I need to get info from. There are 60 listings per page. I’ve attatched a screenshot of what info I need to scrape for every listing. I have no problem with taking the price for every listing, but the code that is near the sentence “Numer katalogowy części” is complicated for me. It’s not always in the same place. Some listings have it in the end, some in the front or middle. I would love to take the Price, price with arrival, URL of the listing and the code that is near “Numer katalogowy części”, but dont know how.
For now I have it like this:


Where it says info, I would love for it to be only the code near the sentance.

UPADTE: the web page that I need to get info from: Przedmioty użytkownika PHUSJOK - Allegro

Have you check this workflow @Povilas_Jonikas

a first mini rnd had this result

Column2 is what you are looking for right?

Yes Gokul, but there is a problem. In the attached screenshot I have an example. The red line is how and what I would love to get from the page. The blue line simbols what the data scrapper takes automatically.
What I mean by that is that the thing I want to get is not always in the same place, thats why I don’t get the info I want. Im trying to figure out is it possible to get the text that is always next to the sentance “Numer katalogowy części”

Yes, but you have the same problem I see. You’r pattern takes the info next to Producent czesci, which I dont need. I only need the text that is next to Numer katalogowy części

When i did table extraction, I got it like this. It automatically takes the text that is below it, looks for a pattern and only finds this.
Is it possible to just take text that is near the words “Numer katalogowy części”?

OK go it. Thanks for the illustration:

The website is swapping the structures
In that case the dt/dd are challenging the options how we can configure the column selectors

We do have at least following options:

  • retrieve the entire line and split the parts in a post processing cleansing run
  • combine data extract approach with a find children / get XX Approach and do a cleansing afterwards
1 Like