Data scrapping with selectors

Povilas_Jonikas · August 24, 2022, 12:08pm

Hi guys. So im trying to learn how the selectors work, so that when I use data scrapping my ui would be more accurate, but I’ve ran into a problem where I dont know what to do.
https://allegro.pl/uzytkownik/jgd-parts (THE URL THAT I NEED TO SCRAPPE)

So I need to get the code that is next to the “Numer katalogowy części” text but it is not always in the same place. I try to use Extract table data, because I need to scrappe all the pages (In total 23)

This is how the uipath file looks like:

Thanks in advance guys!

Gokul_Jayakumar · August 24, 2022, 12:35pm

Hello @Povilas_Jonikas

Kindly refer this thread, It may helps you

Povilas_Jonikas · August 24, 2022, 1:26pm

Hi @Gokul_Jayakumar, and thanks for the response, but I’ve already watched theese videos. You see, my main problem is that my data scrapping activity cant find the pattern, because there are 4 almost the same elements there. Im trying to write a selector that would only take the info I need, but im having trouble with that

Gokul_Jayakumar · August 24, 2022, 2:07pm

@Povilas_Jonikas , It is may be bug
I can get pattern

Try this

Uninstall uipath extensions in chrome
Un install chrome extension iin uipath tool

image1051×514 33.3 KB
Again install extension in uipath tool, Chrome will close and extension will install freshly
Try the Data scraping process again

postwick · August 24, 2022, 3:04pm

You can’t get just that one bit of text, because it’s not its own separate object. You get the entire text it’s in - which is probably a SPAN or DIV - and then extract just the part you want using RegEx.

Povilas_Jonikas · August 25, 2022, 6:02am

Thanks for the response guys, I appreciate that a lot.

You see guys, im trying to extract it like this: But the main problem is that im getting an error that the elements are almost the same. Is there any way to go around this?

Povilas_Jonikas · August 25, 2022, 6:03am

Hi Paul and thanks for the response. It looks a bit like this:

Is it possible to get an insight on how RegEx works?

Gokul001 · August 25, 2022, 6:05am

Hi @Povilas_Jonikas

Do you need to get this value?

Regards
Gokul

Povilas_Jonikas · August 25, 2022, 12:16pm

Sorry for the late response. Yes, I would love to get only this result, but It just doesn’t work for me

Gokul001 · August 25, 2022, 12:24pm

Hi @Povilas_Jonikas

How about the XAML file?

DataScrapping_Allegro.xaml (10.2 KB)

Output

Numer.xlsx (14.1 KB)

Regards
Gokul

Gokul_Jayakumar · August 25, 2022, 1:08pm

@Povilas_Jonikas
Are you uninstall and reinstall the UiPath extension in chrome?

Povilas_Jonikas · September 21, 2022, 11:16am

Is it possible to get the text that is near the sentance “Numer katalogowy części” only?

Gokul001 · September 21, 2022, 11:18am

We can do this using Data Scrapping @Povilas_Jonikas

Can you share more details with screen shots

Povilas_Jonikas · September 21, 2022, 11:31am

So there is this page that I need to get info from. There are 60 listings per page. I’ve attatched a screenshot of what info I need to scrape for every listing. I have no problem with taking the price for every listing, but the code that is near the sentence “Numer katalogowy części” is complicated for me. It’s not always in the same place. Some listings have it in the end, some in the front or middle. I would love to take the Price, price with arrival, URL of the listing and the code that is near “Numer katalogowy części”, but dont know how.
For now I have it like this:

Where it says info, I would love for it to be only the code near the sentance.

UPADTE: the web page that I need to get info from: Przedmioty użytkownika PHUSJOK - Allegro

Gokul001 · September 21, 2022, 11:36am

Have you check this workflow @Povilas_Jonikas

ppr · September 21, 2022, 11:42am

a first mini rnd had this result

Column2 is what you are looking for right?

Povilas_Jonikas · September 21, 2022, 11:42am

Yes Gokul, but there is a problem. In the attached screenshot I have an example. The red line is how and what I would love to get from the page. The blue line simbols what the data scrapper takes automatically.
What I mean by that is that the thing I want to get is not always in the same place, thats why I don’t get the info I want. Im trying to figure out is it possible to get the text that is always next to the sentance “Numer katalogowy części”

Povilas_Jonikas · September 21, 2022, 11:47am

Yes, but you have the same problem I see. You’r pattern takes the info next to Producent czesci, which I dont need. I only need the text that is next to Numer katalogowy części

Povilas_Jonikas · September 21, 2022, 11:51am

When i did table extraction, I got it like this. It automatically takes the text that is below it, looks for a pattern and only finds this.
Is it possible to just take text that is near the words “Numer katalogowy części”?

ppr · September 21, 2022, 11:59am

OK go it. Thanks for the illustration:

The website is swapping the structures
In that case the dt/dd are challenging the options how we can configure the column selectors

We do have at least following options:

retrieve the entire line and split the parts in a post processing cleansing run
combine data extract approach with a find children / get XX Approach and do a cleansing afterwards

Topic		Replies	Views
Selector issue for data scrape Studio studio , question , activities_panel	17	1235	October 7, 2022
Using Data Scrapping From Web - Cannot find all pattern elements - UiPath Studio Community Edition Help studio , data_scraping	1	3305	March 3, 2018
Cannot find the UI element corresponding to this selector: <webctrl data-ui-path-top-container='1'/> Learning Hub	7	1116	July 21, 2020
Scrapping particular data from a html file Studio datatable , uiautomation , activities , studio , data_scraping , question , activities_panel	13	1922	March 3, 2023
UI Path Data Scrapping not extracting all the rows of data Help datatable , uiautomation , activities , data_scraping	20	7795	February 7, 2024

Most Active Users - Yesterday
sharazkm32
singh_sumit
ashokkarale
lrtetala
prashant1603765
sonaliaggarwal47
Justin_Tan_Jun_Song_EE
Anil_G
mively
shrikrushna.bhoi
More details...

Data scrapping with selectors

Related topics