Article title web scraping - problem

senek · January 29, 2022, 10:53am

Hi,

I am facing some problems with web scrapping. I am new to Ui Path software, so any help is appreciated. I want to scrape article titles from a website.
Basic web scraping is not enough - it gives me only part of the title and some text underneath it (I don’t need it, I would love to put it in the second column).
I managed to get the title I want using the “get attribute” activity, but I want to automate that process. Is there a way to loop it?

The Page looks like this:

Can you help me? I want to export a list of titles and things underneath it to excel.

Nithinkrishna · January 29, 2022, 11:29am

Hey @senek

Kindly use the Data Scraping wizard and perform it seamlessly.

Thanks
#nK

senek · January 29, 2022, 12:40pm

Hey, it doesn’t work as it should. This is the result:

Whole thing instead of a title

Nithinkrishna · January 29, 2022, 3:09pm

Hey @senek

Hope you indicated only title?

Also, if it’s a public site please share link to check…

Thanks
#nK

senek · January 29, 2022, 4:07pm

Yes, I indicated only the title, but I guess the website is a little bit tricky. sure here you go https://rpa.hybrydoweit.pl/

ppr · January 29, 2022, 4:20pm

we can do with a manually editing the extract config:

<extract>
	<row exact="1">
		<webctrl tag="div"/>
		<webctrl tag="article" idx="1"/>
	</row>
	<column exact="1" name="Column1" attr="text">
		<webctrl tag="div"/>
		<webctrl tag="h3" idx="1"/>
	</column>
	<column exact="1" name="Column2" attr="href">
		<webctrl tag="a" idx="1"/>
	</column>
</extract>

senek · January 29, 2022, 4:50pm

Thank you! It’s amazing. Can you tell me how you determined it? Is there a way to learn it (haha I bet it is)?

Also, I copied it and it does not work:

ppr · January 31, 2022, 9:33am

@senek
A little experience is helpfully but in general we can practice straightforward like this:

start with data scraping wizard
when we cannot take selectors more detailed e.g. ot get only the blocks like:

grafik1271×763 13.9 KB

THEN: we check the structure of the webpage

We do see, that it is clear divided into the different sections:

Now we do following:

indicate the article blocks in wizard
indicate the article blocks again in the wizard for a second column

with the second correlated data it generates for us the row extract definition:

<extract>
	<row exact="1">
		<webctrl tag="div"/>
		<webctrl tag="article" idx="1"/>
	</row>

we refer just back to the structure of the website and do postediting the extract config xml manually

in this alternate example we used the title from the image

<extract>
	<row exact="1">
		<webctrl tag="div"/>
		<webctrl tag="article" idx="1"/>
	</row>
	<column exact="1" name="Column1" attr="alt">
		<webctrl tag="img" idx="1"/>
	</column>
	<column exact="1" name="Column2" attr="href">
		<webctrl tag="a" idx="1"/>
	</column>
</extract>

so we got the the full title instead the shortened text ending with … for long texts

Once we have done and confirmed the wizard, then we just cross check the selector of the extract structured data activity and verify that it is targeting the list of all aticles correctly and conform to what we configured for the row extract definition.

Also have a look here:

Carl_Robillos · February 7, 2022, 4:05am

Hello there. New to UIPath as well. I don’t have that on my menu. Has UIPath updated it or it’s not on CE?

Nithinkrishna · February 7, 2022, 4:57am

Hey @Carl_Robillos

It is the Table Extraction menu item in the toolbar (above screenshot)

Thanks
#nK

Carl_Robillos · February 7, 2022, 5:56am

Oh thanks!

Topic		Replies	Views
Data scrape in data scraping on the website Studio	11	1064	April 13, 2022
Need to title and description Help	11	1540	September 7, 2018
Get Specific text by searching different title names in single webisite and populate this text data to column in excel to map with 'Title' that is searched Help	10	1030	July 6, 2020
Scrape the details with title Studio uiautomation	3	586	August 13, 2023
Extracting Titles and Content from Inconsistent News Websites: A Scraping Challenge Studio studio , data_scraping , question , web-scraping	0	492	March 3, 2023

Article title web scraping - problem

Related topics