Hi,
New to UI Path. I’m using Data scraping tool and scraping data from a web site which has multiple li tags. And the output data table variable contains all li tag contents are merged to one. How can I separate this and get each li content ? Also is it possible to get the only tags which is having some specified text ?
Could you kindly specify the site name from where you are trying to scrape data ? if not confidential… Also, same some screenshot for better understanding.
Sorry, its an intranet website and its confidential. Please check the screenshot. Each section is in li tag and I want to loop through each and get details. If there is any other good methods to get the data, please let me know.
@binoyav
can you share some details on the element structure with us. e.g. open developer tools (oftten F12) in browser and extract a snippet from the relevant LI series. Thanks
Like this multiple li tags are showing in the page and I want to extract the heading (ex:- Field Changes) and date as well as the inner ul → li contents given in the code (Incident state, Target date etc)
@binoyav
from snippet mapped to screenshot the details cant get fully cleared. So my answer will be more general. For retrival following options can be checked:
data scrapping with finetuning the column definitions (especially the attribute setting)
find children
Also combining both approaches allows to adress comple scenarios and following technique can help
readin with data scrapping iterate
over the extracted data rows and use rowindex for dynamic selectors (e.g. also can evaluates if to use or not based on row column values…)
Another technique is to grab the outerhtml and parse it later to the different elements. As you told your LIs are merged it has potential to solve it
Thanks for the help. Some tutorial links would be more useful if I get. Is it possible to get each li in a datatable row ? Now everything coming together.
@veerishu
you can try to save the page into static html (often results in a package or zip) with the Browser save page…
and change sensitive data into dummy values by editing the HTML. If this would work, then we can refer to this and can support you offline
the yellow marked fields can be extracted with Data Scraping
the Comment LI lines (red Lines) can be extracted in all but not line splitted, the number or LI items is not fixed
Data Extraction Flow
Extract the yellow fields with Datscrapping
Add a column for the Comments Info to the Datatable
Iterate over the Rows from extractedData Datatable
Use iteration Index as a dynamic selector and get the outer XHTML attribute value from the Comment UL
The retrieved XHTML can be handled as XML and we can further parse it into the different lines with the common APIs (xDocument, Elements()…)
Mark the parsed and split information with | for elements and # for LI seperation
add a joined string to the DataTable
and with a split # logic and replace of | we can access the different items
Kindly note: this was a technically RnD to prove if we have chance to get it and we do have. The Parsing needs some enhancements on deeper nested elements (currently not sperated by |)
Unfortunately Data Scrapping allows not the scraping of attributes like innerhtml, outerhtml. Otherwise we had could avoid the get Attribute. Maybe @loginerror can give us a feedback on this
the demo requires that your provided html from above is opened in IE. But it was working perfectly I was able to implement against a system that I do not have access
Thanks a lot. This is helped me to get the data. But there is an issue with inner li data. The space is getting removed in the output. Please check the Pending Target Date and Substate values.
pprRobot Master
Kindly note: this was a technically RnD to prove if we have chance to get it and we do have. The Parsing needs some enhancements on deeper nested elements (currently not sperated by |)