[HowTo] Data Scraping - Advanced Configuration - Text Field, Image Source, Url, CSS Classname, Hover text

ppr · July 18, 2020, 9:58pm

This HowTo introduces on how Data scraping can be configured to retrieve also on non standard information from a web table. After indicating the different data columns with the wizard the extract data definition was post edited and changed to the relevant attributes e.g. value (Text field), src ( Image Source), class (CSS Class Name), tite (Hover Text), href (Url).

Introduction

Following web table is to use for data scraping and also the non text information should be retrieved.

grafik

We are interested on following details:

ID
Name
Task
Cercle Type
Hover text of cercle
Prio info
Url

Preperation / Analysis

It always recommended to do a quick check on Browsers web tools (F12) and / or UiEplorer. The table looks like this:
grafik

The quick look shows us

it is organized in tabular structure based on a table (instead of a div table representation)
the different information sources are yellow marked and identified
first row with the headers are used within TH tags

So it looks good, lets do the retrieval

Data Scraping configuration

First Column (ID)

Start with data scraping

Select Element Dialog - click next
click on the first ID Value
following dialog is displayed:
Click No (Nein) - we want to fine control the retrieval configuration
Select Second Element Dialog - click next
click on the second ID Value
Following Dialog is shown:
No url extraction is required, the column name is set later

Following Preview is shown:
grafik

Second Column (Name)

Click on the preview dialog extract roccrelated data
similar to the first column the first element is indicated - first name
indicating the second element - second name
result is:

Regadles if the selectors are correct or invalid, the empty column values are correct

An empty result is received as the name value is not text in the data cell. The name info is a value in a text field (refer to screenshot above)

Lets adopt the extraction by the following steps:

Click Edit Data Definition
Validate the extraction result that it is selecting an input
Check that the second table call is selected: td idx=‘2’
change attribute from text to value:

And validated the new generated preview:
grafik

Additonal columns

repeat the steps from first column and add the other columns by right indicating the column first element value, second element value
Click on Edit Data Definition and modify as following:

grafik426×561 25.7 KB

Result:
grafik

Final Result

grafik
The datatable with the extracted values. The PrioInfo values are the different css classes. In a conversion run also this info can be mapped e.g. to …circle-up = HIGH etc.

Tips

After each editing the extract data definition copy the result / modified extract metadata XML into the clipboard
Do at first the additions / selection of the different columns and edit the extract data definition on the end.
- Reason: after modifying the extract data definition and adding the a new column the modifications are reset. Thats why also the part results are copied to the clipboard
in case of suspicious preview results after heavy editing rounds stop the wizard and restart it again

Downloads

HowTo_TableFieldClassImgLink.zip (175.5 KB)

Questions

For questions on your retrieval case open a new topic and get individual support

loginerror · August 1, 2020, 11:56am

Cool article! I moved it to our FAQ category.

supermanPunch · October 4, 2020, 7:20am

Hi @ppr Will the Steps be the same even if the Table representation is in a Div table format ?

ppr · October 5, 2020, 9:42pm

@supermanPunch
we did it also in some projects where the data was organized in rows and columns e.g. represented by divs.

The very important part is to get defined a reliable row iterator selector and consistent selector to the correlated data within the extract data definition.

Victor_C_C · October 6, 2020, 7:50am

Awesome! Thanks a lot!

afeno · October 30, 2020, 7:20am

Great article! Thank you.
There is a way to extract “everything” that is in the block instead of extracting the Text, Class, Value, etc.
In my case, each element contains a “structured data” insde (4 elements). But sometimes 1 element is missing so I would like to extract everything (Everything=source code) so I can past it manually.
Any idea how can I do that?
Thanks!

Aleem_Khan · February 2, 2021, 5:53pm

Nice

tsverthoff · February 2, 2022, 4:37pm

@ppr - what if the UI Element is not part of a table?

Please see: Get list of running web apps using browser task manager

ppr · February 2, 2022, 5:22pm

@tsverthoff
this HowTo is describing the approach applied on webpage / for web applications
your referenced topic is another case and different

Better to keep the discussion about your linked topic there itself.

tsverthoff · February 2, 2022, 7:08pm

I agree. That is why I specifically posted that question to a new topic, rather than posting the content of the topic here.

agathiyanv · August 10, 2022, 9:25am

how to loop data scrapping in a web page. Where the table is spanning over multiple pages and it is inside the pallet.

In this case the once the inner page (1-18) is done then it move to next page Outer(1 of 3)the do the data scrapping.

ppr · August 10, 2022, 4:28pm

@agathiyanv

if not already done, just open a topic for your case and we will pick it up from there

Tech_Arima · April 18, 2023, 12:46pm

This has been shown clearly in the below video, please watch and understand the steps in detail.

ppr · April 18, 2023, 1:03pm

we can good see in the video what the tip/hint is about

Topic		Replies	Views
How to do Data Scraping? Activities uiautomation , activities , studio , question	2	554	February 6, 2023
Extract data table - get a specific attribute instead of the text in the table Help	2	2716	September 7, 2020
Extract unstructured Html datatable Studio uiautomation	4	1529	February 9, 2022
How to scrape a web table, with different elements, and three columns with URLs Studio datatable , activities , studio , data_scraping	5	1772	February 26, 2022
Table extraction1 Studio activities , question	5	628	September 16, 2022

Most Active Users - Yesterday
ashokkarale
MD_Farhan1
Ajay_Mishra
postwick
Dheerendra_vishwakarma
Anil_G
chandreshsinh.jadeja
Gautham_Pattabiraman
vrdabberu
aravindbalineni123
More details...