Introduction
Following web table is to use for data scraping and also the non text information should be retrieved.
We are interested on following details:
- ID
- Name
- Task
- Cercle Type
- Hover text of cercle
- Prio info
- Url
Preperation / Analysis
It always recommended to do a quick check on Browsers web tools (F12) and / or UiEplorer. The table looks like this:
The quick look shows us
- it is organized in tabular structure based on a table (instead of a div table representation)
- the different information sources are yellow marked and identified
- first row with the headers are used within TH tags
So it looks good, lets do the retrieval
Data Scraping configuration
First Column (ID)
Start with data scraping
- Select Element Dialog - click next
- click on the first ID Value
- following dialog is displayed:
- Click No (Nein) - we want to fine control the retrieval configuration
- Select Second Element Dialog - click next
- click on the second ID Value
- Following Dialog is shown:
- No url extraction is required, the column name is set later
Following Preview is shown:
Second Column (Name)
- Click on the preview dialog extract roccrelated data
- similar to the first column the first element is indicated - first name
- indicating the second element - second name
- result is:
Regadles if the selectors are correct or invalid, the empty column values are correct
An empty result is received as the name value is not text in the data cell. The name info is a value in a text field (refer to screenshot above)
Lets adopt the extraction by the following steps:
- Click Edit Data Definition
- Validate the extraction result that it is selecting an input
- Check that the second table call is selected: td idx=‘2’
- change attribute from text to value:
And validated the new generated preview:
Additonal columns
- repeat the steps from first column and add the other columns by right indicating the column first element value, second element value
- Click on Edit Data Definition and modify as following:
Result:
Final Result
The datatable with the extracted values. The PrioInfo values are the different css classes. In a conversion run also this info can be mapped e.g. to …circle-up = HIGH etc.
Tips
- After each editing the extract data definition copy the result / modified extract metadata XML into the clipboard
- Do at first the additions / selection of the different columns and edit the extract data definition on the end.
- Reason: after modifying the extract data definition and adding the a new column the modifications are reset. Thats why also the part results are copied to the clipboard
- in case of suspicious preview results after heavy editing rounds stop the wizard and restart it again
Downloads
HowTo_TableFieldClassImgLink.zip (175.5 KB)
Questions
For questions on your retrieval case open a new topic and get individual support