Automating Web page handling without VMs, just using DOM and HTML interpretation

Im new to UiPath and im trying to find easily if its able to handle DOM/HMTL full interpretation without a VM. Allocating and deallocating VMs is a pain ,expensive and slow.
I would like to ask directions on the documentation regarding this functionality.

Besides this, i suppose that smaller containers could be used, and GDI windows primitives could be interpreted and virtually rendered/handled in a similar way. Is there any effort regarding this path?


1 Like

wonder if someone have any tip or reference to share

What do you mean without a VM? UiPath is UI automation. You have to have the automation running in a Windows UI. You don’t have to allocate/deallocate VMs to do this. You just need one. Then if you have multiple automations you want to run at the same time you may need additional VMs in the future. You don’t allocate a VM, run an automation, then deallocate the VM.

You could, of course, use a physical computer (ie laptop, desktop, or server) to run the automation jobs. But most people use VMs for scalability.

Think in this way. A website in a browser is expected to be way easier to be interpreted / deconstructed than a fully featured desktop app. Its not a simple task, but very doable to have an interpreter (maybe even an internal virtual screen manager with OCR) that could act in the same way as a browser, managing the website HTML/CSS DOM structure for interaction
Maybe this describe more what this technique is all about (extrapolating it)

I dont need a VM or even an emulated machine for a browser. A much simpler UiPath server could handle way more automations than spawning VMs.

Btw, this rabbit hole could go further, interpreting Windows GDI(+)/Direct2D and adding even Windows API analysis/handling, preserving screen independence / isolation
Which Wine is all about this approach, but no automation/scripting/interpreting feature is embedded on it, but could be.


UiPath works directly with the web page objects in the browser, can scrape data, etc. It is very strong in web page automation.

It simply needs to be running on a Windows desktop to function. Because Windows is what gives it the UI objects.

I think maybe you don’t have a proper mental image of the purpose of UI automation. It’s to mimic human interation - ie clicking and typing - to perform repetitive tasks the same way humans do, exactly because there is no API, connector, etc that can solve your business case.

I guess Web Crawling with XPath may be useful im my case
Ive hoped that UiPath could do the next step in automation, providing a fully fledged DOM interpreter, when dealing with webpages , just like Selenium.

Spawning a miryad of VMs is so expensive

Just to add something. I think the closest to what you are saying would be the headless browser support:

This is not to say that you don’t need a machine to run the automation, because you do need to execute your code somewhere (if you want to use UiPath robots).

You don’t have to. You can run multiple automations on one VM.

Which idea is to use there? RDP? im failing to see desktop independence between session. Doable, but im afraid may become messy


Thanks Maciej. Very appreciated

Yes, RDP is how UiPath works. Orchestrator uses RDP to connect to the computer where you want the automation to run. You can have multiple accounts logged in at the same time running different automations, or even multiple jobs of the same automation.

Any hint if RDP sessions are sharing the same code and os memory (playing nicely with multithreading) ? im afraid this thing does escalate on redundant memory allocation pretty bad.
Too bad i cant fin a secure memory code shared virtualizer/hypervisor
Memory Sharing.
Easiest way to enable more than 2 concurrent RDP sessions on Windows Server 2016

Hi Alexandre,

Earlier in the thread, you mentioned running a robot in smaller containers for page scraping. UiPath offers that functionality with our Linux robots in cross-platform projects.

In this scenario, there is still a deployed (virtual) machine, but the infrastructure allows you to deploy as many robots simultaneously as you have runtime licenses on Docker containers, up to the resource limits of the underlying machine.

There are significant functional limitations, such as only being able to use Chrome browsers with a subset of UIAutomation activities present.

If the goal is to reduce the management overhead of VMs, we offer other options that provide the full range of automation capabilities with our Automation Cloud robots, where UiPath manages all the robot infrastructure, and elastic robot orchestration, where UiPath manages them on your behalf, but they are still in your cloud, and you can choose how much of the robot orchestration process you want to delegate to us.

In any case, the robot requires a platform on which it can carry out the automation, even if it is leveraging a user’s active session.

I hope this provides insight on the available options!

Hi Aden

Thanks for the options presentation. I have to recheck with our automation team, but i believe that we could go further, looking after for the next mile/tuning, offering alternatives to our automation team without relying on a fully fledged browser.

The objective is to have a cheaper VM infrastructure. A fully fledged browser instantiated in a container wastes so much memory handling an interface, than a slim HTML/DOM interpreter. Its hard to do a snappy/lightweight interpreter? yep, but is doable and worthy. It would be even possible to skip many VMs, and solve the scalability with threads and processes instead of VMs/Containers

Think about this: A page interpreter/compiler in theory wouldn’t need to download images and so forth

Haven’t tested querying XPath, but the idea is this could be as slim as XPath querying