Cannot retrieve HTML from Amazon's product pages

I am trying to retrieve Amazon product’s details in HTML format. I have always used the same HTML retrieving code many times, and it has always worked. But for amazon’s product page, it doesn’t work. The result text is garbled and I cannot parse it.
I tried both MessageBox and Write Text File to see the result, but both don’t work.
Here is the code I have:

client: (of System.Net.WebClient)
result : (of String) - this should store the result HTML retrieved from website

Assign activity: client = New WebClient()
Assign activity: client.Encoding = System.Text.Encoding.UTF8
Assign activity: result = client.DownloadString(/* I entered a URL to an Amazon product page */)

MessageBox activity: result //This shows the below result.

Write Text File activity: //I set a path with “result” as the input, and I get the below result.

I don’t know why it is doing this. All I want is just HTML text from the URL.
Plus, I normally don’t even have to use the second Assign activity (where I am setting the encoding to UTF8). I included this because it was not working without it, and some famous coding forum people said add this UTF8 setting to fix the garbled text. But in my case, it didn’t fix it.
I am at my wits end. Can anyone help me?

Well, i guess you are a .NET developer, but isnt the whole point of using UiPath, to do stuff without having to code a lot? Did you try using Open Browser and get text the usual way?

1 Like

Hello @tomato25 Try to use utf-8 in the encoding property of Write Text File activity. For Write Text File activity, system ANSI code page is selected by default. Hope it works!
Please do refer for all types of encodings.

1 Like

True, but in some steps, coding works way better/faster and most importantly, more stable than element-dependent UiPath activities. If I only had to retrieve HTML from a few URLs, I would just use Open Browser activity, but this project requires retrieving HTML from 30+ Amazon product pages.

I feel like it is too many, and it would kind of look ugly if the robot keeps opening and closing a browser 30+ times. I thought using C#/VB code would help make this process faster, even without opening the browsers.

I tried this, but the resulting text file contains garbled text.

I also tried other non-Amazon URLs and it works fine.
It seems like it is the way Amazon website is constructed… Is there any way to retrieve Amazon URLs HTML?

Could you please share me your Amazon URL?

1 Like

Main.xaml (7.6 KB) Please take look how uipath could do it, or if really want to do with code only, maybe even faster if you just do in Visual Studio…

1 Like

Well, this is only one part of the entire project, and other parts require actual UiPath activities.

1 Like

Any Amazon product pages… Below is an example.
I tried my original code (plus I added utf8 for the Write Text File property as you recommended), but the result is still garbled.

Could you please try this iso-8859-1 instead of utf-8 in Write Text File Encoding property…?

1 Like

I got this with “iso-8859-1”

Actually this is way faster than I thought…
I actually don’t need the entire HTML code. I needed a few descriptions from each product. I thought retrieving the HTML text would work good, but I guess I could just retrieve the page text itself, not HTML.

Your program is returning the page text we actually see, not the HTML text, but this seems to include the information I need.

1 Like

Forgot tot mention this… Have to change at client.Encoding as well. The amazon page has charset iso-8859-1

1 Like

just adjust the selector, it will be fast im sure :wink: 30 pages is not that much…

1 Like

How could I set this encoding to iso-8859-1? I don’t see the option in here

You could try this way System.Text.Encoding.GetEncoding(“ISO-8859-1”) and as we now know that amazon is using charset iso-8859-1 so we can also try to convert that to utf-8. As utf-8 websites are working for you.
The below link is for converting the charsets in future only if you required to convert then use this link.

1 Like

Thank you for all the posts and your answer/links!
But unfortunately, iso-8859-1 still doesn’t work.