Tip: How to use XPath Query Language for HTML Site Automation

XPath is a very powerful query language to finding nodes in HTML documents. You can find more information about XPath at Wikipedia or a great XPath tutorial at W3Schools. To use the possibilities of XPath for HTML Site Automation it is necessary to install the additional library HTMLAgilityPack. Download the HTMLAgilityPack from nuget.org and add it via the package manager to your project.

To check the possibilities I use a very easy example. A tiny site with a table with four rows and the first column contains different buttons.

All I want to do is to click on any button in the first table. To realize this I code a tiny stub to detect all the buttons via XPath. In this case I use //table[1]//child::button expression. Previously I simply defined an empty data table to store the selectors. In this example InnerHtml, XPath and CSSSelector. All UI elements that were found are stored in the table.

//-Begin----------------------------------------------------------------

tblSelectors.Clear();
tblSelectors.Columns.Add("InnerHtml", typeof(string));
tblSelectors.Columns.Add("XPath", typeof(string));
tblSelectors.Columns.Add("CSSSelector", typeof(string));

string html = @"http://localhost/";
HtmlWeb web = new HtmlWeb();
HtmlDocument htmlDoc = web.Load(html);

string XPath = @"//table[1]//child::button";
HtmlNodeCollection Buttons = htmlDoc.DocumentNode.SelectNodes(XPath);

foreach (HtmlNode Button in Buttons) {
  string CSSSelector = Button.XPath.Remove(0, 1).Replace("/", ">");
  object[] objRow = { Button.InnerHtml, Button.XPath, CSSSelector };
  tblSelectors.Rows.Add(objRow);
}

//-End------------------------------------------------------------------

In the next step I use the CSSSelector to execute the activity, in this case click the button.

The CSSSelector is also a variable which I use in the Selector Editor.

image

In the output window we see the different CSS selectors used by the click activity.

image

This simple example shows the possibility of identifying a group of UI elements via XPath. Certainly, the additional programming and data exchange is effort and increases the complexity. But with well formulated XPath expressions, you can increase cross-release stability and make automation processes more robust. I have come to know and appreciate this especially in the context of automating web applications with Selenium. Certainly UIAutomationNext is a big step forward, e.g. to minimize the use of XPath. But there may still be cases where XPath may be the better choice. This example is intended to provide a perspective on this.

Addendum 11.09.2022: Tried successfully with UiPath Studio 22.8 in Windows compatibility mode (x64).

9 Likes

Stefan! I really appreciate the effort and quality you put into your posts. In a stressed workday it “forces” me to stop, take a look at the things you present and think.

I hope these articles/guides could find a place in a “Greatest Hit” section in this forum.

4 Likes

Thanks for sharing the knowledge @StefanSchnell

I also feel using Xpath in web automation sometime makes our automation process more reliable as they identify the elements on the basis of unique attributes

3 Likes

Hi, i am doing screen scrapping from ( https://epaper.dawn.com/ ) this epaper. once i will select the required data extraction i am facing problem. there have so many container in each page. Could you please let me know how can i extract data from this epaper using HTML in code and then get all data and write in text file.

Thanks in advance.

Hey, I am getting the below error, I installed HTML agility package in UI Path also. Could you please help me out

error CS0246: The type or namespace name ‘HtmlWeb’ could not be found (are you missing a using directive or an assembly reference?)