Loop through ExtractDataTable and print

Hi,
New to UI Path. I’m using Data scraping tool and scraping data from a web site which has multiple li tags. And the output data table variable contains all li tag contents are merged to one. How can I separate this and get each li content ? Also is it possible to get the only tags which is having some specified text ?

Thanks
Binoy

Could you kindly specify the site name from where you are trying to scrape data ? if not confidential… Also, same some screenshot for better understanding.

@binoyav
Welcome to the forum

with a for each row activity you have one option to iterate over each row and to do some processing with the li value

As it is stored in a datatable you can use filter datatable activity to get some LI with a specified text

For more complex scenario other options are available as well

Sorry, its an intranet website and its confidential. Please check the screenshot. Each section is in li tag and I want to loop through each and get details. If there is any other good methods to get the data, please let me know.

@binoyav
can you share some details on the element structure with us. e.g. open developer tools (oftten F12) in browser and extract a snippet from the relevant LI series. Thanks

<li class="h-card h-card_md h-card_comments" style="">
<div class="sn-card-component sn-card-component_first sn-card-component_meta sn-card-component_meta_sibling"><span class="sn-card-component-createdby">Binoy</span></div>
<div class="sn-card-component sn-card-component_first sn-card-component_meta"><span class="sn-card-component-time"><span>Field changes</span><span class="sn-card-component_accent-bar_bullet">•</span>
    <div class="date-calendar">2020-03-25 01:00:32</div>
    <div class="datex date-timeago" title="7d ago" timeago="2020-03-25 08:00:32" data-original-title="7d ago" null="8d ago">8d ago</div>
    </span>
</div>
<div class="sn-card-component sn-card-component_records">
    <div class="sn-widget">
        <ul class="sn-widget-list sn-widget-list-table">
            <li><span class="sn-widget-list-table-cell">Incident state</span><span class="sn-widget-list-table-cell"><span>Work In Progress</span><span class="sn-widget-list-table-italic">was</span><span>Pending</span></span>
            </li>
            <li><span class="sn-widget-list-table-cell">Pending Target Date</span><span class="sn-widget-list-table-cell"><span>[Empty]</span><span class="sn-widget-list-table-italic">was</span><span>2020-03-25 01:00:31</span></span>
            </li>
            <li><span class="sn-widget-list-table-cell">Substate</span><span class="sn-widget-list-table-cell"><span>[Empty]</span><span class="sn-widget-list-table-italic">was</span><span>Customer</span></span>
            </li>
        </ul>
    </div>
</div>
<div class="sn-card-component_accent-bar " style=""></div>

Like this multiple li tags are showing in the page and I want to extract the heading (ex:- Field Changes) and date as well as the inner ul → li contents given in the code (Incident state, Target date etc)

@binoyav
from snippet mapped to screenshot the details cant get fully cleared. So my answer will be more general. For retrival following options can be checked:

  • data scrapping with finetuning the column definitions (especially the attribute setting)
  • find children

Also combining both approaches allows to adress comple scenarios and following technique can help

  • readin with data scrapping iterate
  • over the extracted data rows and use rowindex for dynamic selectors (e.g. also can evaluates if to use or not based on row column values…)

Another technique is to grab the outerhtml and parse it later to the different elements. As you told your LIs are merged it has potential to solve it

Thanks for the help. Some tutorial links would be more useful if I get. Is it possible to get each li in a datatable row ? Now everything coming together.

@veerishu
you can try to save the page into static html (often results in a package or zip) with the Browser save page…
and change sensitive data into dummy values by editing the HTML. If this would work, then we can refer to this and can support you offline

html.zip (56.1 KB)

herewith attaching the html file. I removed all other sections and kept only the li tags. Please check.

ok fine I will have a look on later after job is done and lets see how much we can scrap from this

1 Like

@binoyav
grafik

the yellow marked fields can be extracted with Data Scraping
the Comment LI lines (red Lines) can be extracted in all but not line splitted, the number or LI items is not fixed

Data Extraction Flow

  • Extract the yellow fields with Datscrapping
  • Add a column for the Comments Info to the Datatable
  • Iterate over the Rows from extractedData Datatable
  • Use iteration Index as a dynamic selector and get the outer XHTML attribute value from the Comment UL

The retrieved XHTML can be handled as XML and we can further parse it into the different lines with the common APIs (xDocument, Elements()…)

  • Mark the parsed and split information with | for elements and # for LI seperation
  • add a joined string to the DataTable

and with a split # logic and replace of | we can access the different items

grafik

Kindly note: this was a technically RnD to prove if we have chance to get it and we do have. The Parsing needs some enhancements on deeper nested elements (currently not sperated by |)

Unfortunately Data Scrapping allows not the scraping of attributes like innerhtml, outerhtml. Otherwise we had could avoid the get Attribute. Maybe @loginerror can give us a feedback on this

Find demo XAML here
binoyav.xaml (16.2 KB)

the demo requires that your provided html from above is opened in IE. But it was working perfectly I was able to implement against a system that I do not have access

1 Like

I don’t think it does. Maybe @gheorghestan will know more :slight_smile:

Thanks a lot. This is helped me to get the data. But there is an issue with inner li data. The space is getting removed in the output. Please check the Pending Target Date and Substate values.

@veerishu
please share a screenshot for this

output.txt (572 Bytes)
PFA
Even lines are breaking in between

@binoyav

pprRobot Master
Kindly note: this was a technically RnD to prove if we have chance to get it and we do have. The Parsing needs some enhancements on deeper nested elements (currently not sperated by |)

PendingwasWork is such a case

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.