pboleszc
(Pawel)
October 2, 2020, 6:19am
1
Hello!
I have a problem with scraping data from a table which has columns with the same selector ID in html code.
Example:
<extract>
<row exact="1">
<webctrl tag="tr"/>
</row>
<column exact="1" name="Column1" attr="text">
<webctrl tag="div" class="specific_div_name"/>
</column>
<column exact="1" name="Column2" attr="text">
<webctrl tag="td" class="specific_td_name"/>
<webctrl tag="a"/>
</column>
<column exact="1" name="Column3" attr="text">
<webctrl tag="td"/>
<webctrl tag="b"/>
</column>
<column exact="1" name="Column4" attr="text">
<webctrl tag="td" class="specific_td_name_2"/>
</column>
<column exact="1" name="Column5" attr="text">
<webctrl tag="td" style="white-space: nowrap;"/>
<webctrl tag="b"/>
</column>
</extract>
In the situation above I get the same values for columns 3 and 5, although the source has different values. The problem here is that these columns dont have specific IDs in html code on the website.
Is there any way to distinct them?
Thanks in advance
ppr
(Peter Preuss)
October 2, 2020, 7:45am
2
@pboleszc
lets assume Column3 is the third column in web table. Give a try on forcing the retrieval by using an index to the column:
<column exact="1" name="Column3" attr="text">
<webctrl tag="td" idx="3"/>
<webctrl tag="b"/>
</column>
do it similar to for the other column as well
pboleszc
(Pawel)
October 2, 2020, 12:57pm
3
Thanks for a try, but unfortunately it doesn’t help Adding “idx” causes that the column is not read (as there isnt any “idx” tag in the html source)
ppr
(Peter Preuss)
October 2, 2020, 1:01pm
4
the idx tag i not needed in the source. it is managed by the UiPath internals.
Unfortunately we cannot inspect the web element structure but maybe it can be simplified by following:
<column exact="1" name="Column3" attr="text">
<webctrl tag="td" idx="3"/>
</column>
can you post a screenshot of this table structure?
pboleszc
(Pawel)
October 2, 2020, 1:14pm
5
Here you are - a screenshot + part of html code of the table
ppr
(Peter Preuss)
October 2, 2020, 1:24pm
6
@pboleszc
it was helping for the first step, but not all was inspectable.
give a try on following (just for analysis reasons)
<extract>
<row exact="1">
<webctrl tag="tr"/>
</row>
<column exact="1" name="Column1" attr="text">
<webctrl tag="td" idx="3"/>
</column>
<column exact="1" name="Column2" attr="text">
<webctrl tag="td" idx="4"/>
</column>
<column exact="1" name="Column3" attr="text">
<webctrl tag="td" idx="5"/>
</column>
<column exact="1" name="Column4" attr="text">
<<webctrl tag="td" idx="6"/>
</column>
</extract>
and check if third up to sixth column is extracted (mandatory that the extract datatable selector is valid)
for the image , field etc extraction have a look here:
This HowTo introduces on how Data scraping can be configured to retrieve also on non standard information from a web table. After indicating the different data columns with the wizard the extract data definition was post edited and changed to the relevant attributes e.g. value (Text field), src ( Image Source), class (CSS Class Name), tite (Hover Text), href (Url).
Introduction
Following web table is to use for data scraping and also the non text information should be retrieved.
[grafik]
We…
in case of it is failing again:
redo it with extracting entire datatable (img, fields etc values will be missing)
redo it by reconfiguring column definitions, but with simplified selectors (no classes, idx to columns etc)
pboleszc
(Pawel)
October 2, 2020, 4:46pm
7
Great thanks! It works when adding idx to all column tags! That’s exactly what I was looking for!
system
(system)
Closed
October 5, 2020, 4:47pm
8
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.