Get HMTL code of a website

studio

#1

Hi,

is there some way how to store all the html code of a certain website into some text variable?

e.g. storing all the HTML source code of webpage www,msn.com to a string variable “Web_string”


#2

htmlString.xaml (5.5 KB)


How to get and store full HTML from page
#3

How would we do it if it was a secure website?


#4

It will work with both…check it out with https://www.google.co.in/.

or one more way by using HttpWebRequest and StreamReader also you can get the html of the provided url but Webclient class to simplify your work.

Regards…!!
Aksh


#5

Hi Aksh.

Is there any way that i can extract the same via DOM explorer or something similar?
Thanks.


#6

Did not get it… sorry… but DOM libraries will mostly available for server side scripting languages and if same thing you are getting with webclient then why you are not using it?


#7

This solution is very cool, and I will definitely be keeping it in mind.

I find however that it’s unable to deal with some URLs… for example Microsoft support docs, such as “https://support.microsoft.com/en-us/help/4025340”. It doesn’t crash or anything, the RPA simply vanishes into the ozone attempting to execute the Webclient.DownloadString on those URLs.

At least in our shop. I suspect it has something to do with security certificates (to be able access those URLs via IE… though not FF or Chrome… we have to add the domain to Trusted Sites) or something to do with the fact that those particular URLs are redirects. For example, URL “https://support.microsoft.com/en-us/help/4025340” will redirect to “https://support.microsoft.com/en-us/help/4025340/windows-7-sp1-windows-server-2008-r2-sp1-update-kb4025340”.

So it’s not solving my particular problem this afternoon. Definitely a technique to keep in mind though.

If anyone has any insight into what goes on here, we could sure use some enlightenment!


#8

Have you tried replacing “WriteLine” with “Write Text file”? I think WriteLine has character limit which is probably holding up the process.


#9

The issue’s not with outputting the retrieved HTML string, it’s getting the string in the first place. The ‘failure’ occurs at the webClient.DownloadString(“www.support.micrsoft.com”) step.


#10

Try this way

htmlString.xaml (7.9 KB)


#11

Hey everyone,

@vvaidya I am trying to extend this and search the source code for a string (variable or URL) and then write the results to a csv file.

For example, I want to be able to read the source code at https://www.samsung.com; see if the code contains “https://twitter.com/” and then write the value to a csv file.

Thanks,


#12

How do I supply username and password to a secure site using your htmlString example?