[HowTo] Data Scraping - Advanced Configuration - Text Field, Image Source, Url, CSS Classname, Hover text

This HowTo introduces on how Data scraping can be configured to retrieve also on non standard information from a web table. After indicating the different data columns with the wizard the extract data definition was post edited and changed to the relevant attributes e.g. value (Text field), src ( Image Source), class (CSS Class Name), tite (Hover Text), href (Url).

Introduction

Following web table is to use for data scraping and also the non text information should be retrieved.

grafik

We are interested on following details:

  • ID
  • Name
  • Task
  • Cercle Type
  • Hover text of cercle
  • Prio info
  • Url

Preperation / Analysis

It always recommended to do a quick check on Browsers web tools (F12) and / or UiEplorer. The table looks like this:
grafik

The quick look shows us

  • it is organized in tabular structure based on a table (instead of a div table representation)
  • the different information sources are yellow marked and identified
  • first row with the headers are used within TH tags

So it looks good, lets do the retrieval

Data Scraping configuration

First Column (ID)

Start with data scraping

  • Select Element Dialog - click next
  • click on the first ID Value
  • following dialog is displayed:
    grafik
  • Click No (Nein) - we want to fine control the retrieval configuration
  • Select Second Element Dialog - click next
  • click on the second ID Value
  • Following Dialog is shown:
  • grafik
  • No url extraction is required, the column name is set later

Following Preview is shown:
grafik

Second Column (Name)

  • Click on the preview dialog extract roccrelated data
    grafik
  • similar to the first column the first element is indicated - first name
  • indicating the second element - second name
  • result is:
  • grafik

Regadles if the selectors are correct or invalid, the empty column values are correct

An empty result is received as the name value is not text in the data cell. The name info is a value in a text field (refer to screenshot above)

Lets adopt the extraction by the following steps:

  • Click Edit Data Definition
  • grafik
  • Validate the extraction result that it is selecting an input
  • Check that the second table call is selected: td idx=‘2’
  • change attribute from text to value:
    grafik

And validated the new generated preview:
grafik

Additonal columns

  • repeat the steps from first column and add the other columns by right indicating the column first element value, second element value
  • Click on Edit Data Definition and modify as following:

Result:
grafik

Final Result

grafik
The datatable with the extracted values. The PrioInfo values are the different css classes. In a conversion run also this info can be mapped e.g. to …circle-up = HIGH etc.

Tips

  • After each editing the extract data definition copy the result / modified extract metadata XML into the clipboard
  • Do at first the additions / selection of the different columns and edit the extract data definition on the end.
    • Reason: after modifying the extract data definition and adding the a new column the modifications are reset. Thats why also the part results are copied to the clipboard
  • in case of suspicious preview results after heavy editing rounds stop the wizard and restart it again

Downloads

HowTo_TableFieldClassImgLink.zip (175.5 KB)

Questions

For questions on your retrieval case open a new topic and get individual support

10 Likes

Cool article! I moved it to our FAQ category.

1 Like