Print webpage as pdf

Hello Community,

I built a bot to open a web page in IE, scroll, and take screenshots until it reaches the bottom of the page.

Here’s my question: Is it possible to grab the entire web page and print it as PDF? If so, how?
When I tried this before, it captured the visible part of the page but could not scroll and capture the entirety of the page.

Thank you for your time and wishing everyone well.

1 Like

Give a try to wkhtml2pdf

https://wkhtmltopdf.org

1 Like

I see how that could be helpful but I do not have a developer background and do not know what a precompiled binary is or how to download one.

You have installers (wkhtmltopdf).

If you don’t know how to add it to you PATH (add to the path windows 10 - Google Search), call it by its full path, for exemple:

path-to-wkhtmltopdf.exe Default title | Domain.com path-to-my-output-file

%PROGRAMFILES%\wkhtmltopdf\bin\wkhtmltopdf.exe https://forum.uipath.com/t/print-webpage-as-pdf/209795 %USERPROFILE%\Desktop\post.pdf

1 Like

This looks helpful but still quite technical for my abilities.

Do I download the win32/win64 for windows vista or later if I’m using Windows 10? How would I put that in my workflow?

Is there another way to do this?

Yes, this is working for Windows 10. In your workflow, use StartProcess with:

  • FileName: the path to wkhtmltopdf.exe as FileName, for example:
    "%PROGRAMFILES%\wkhtmltopdf\bin\wkhtmltopdf.exe"

  • Arguments: A string with the page’s url and the path for the output pdf, for exemple: "https://forum.uipath.com %USERPROFILE%\Desktop\forum.pdf" will output the page into your desktop as forum.pdf.

1 Like

I downloaded it for Windows 10. What do I do next? A few folders were created with lots of data inside and I’m unsure of the next step.

try open the cmd, paste the following line and [ENTER]

%PROGRAMFILES%\wkhtmltopdf\bin\wkhtmltopdf.exe https://forum.uipath.com/t/print-webpage-as-pdf/209795 %USERPROFILE%\Desktop\post.pdf

www.google.fr/search?q=windows+10+run+console

Hello @msan, I used the links you shared and downloaded the files. wkhtmltopdf.exe file was not downloaded and this is the error I get when I run cmd- %PROGRAMFILES%\wkhtmltopdf\bin\wkhtmltopdf.exe Print webpage as pdf %USERPROFILE%\Desktop\post.pdf- ERROR- ‘C:\Program’ is not recognized as an internal or external command,
operable program or batch file.

Hi,

Please try it with double quotes

"%PROGRAMFILES%\wkhtmltopdf\bin\wkhtmltopdf.exe" https://forum.uipath.com/t/print-webpage-as-pdf/209795 "%USERPROFILE%\Desktop\post.pdf"

If the installer set the path directly (I don’t use the installer so I don’t know if it does), you could just try:

wkhtmltopdf https://forum.uipath.com/t/print-webpage-as-pdf/209795 "%USERPROFILE%\Desktop\post.pdf"

Hello,

Thank you for your help but it’s still not working.

I typed both of those into cmd and it wasn’t recognized. I dl the wkhtmltopdf and the wkhmltopdf file inside the bin folder and pdf from wkhtmltox folder.

Not sure how to proceed.

Hello everyone,

Here is my approach to saving a webpage as a pdf using windows 10 pro and chrome. Use a start process activity to start headless chrome with arguments to save/print to pdf. It works pretty well and renders web pages as you would expect. I have read of issues with formatting using some approaches. If you have chrome installed, then no additional software installation is required. I found this approach more reliable than printing as pdf through the chrome user interface.

If your robot/app needs to enter values into a web page form prior to saving, I would consider entering those values and then saving the resultant html file (uses the windows save dialog which IMO is much more friendly to automation than the chrome print dialog). Then feed the html to the headless chrome.

Perhaps this is all easier with Internet Explorer or Edge, but I needed Chrome.

SavePDFofWebpageUsingHeadlessChrome.xaml (5.7 KB)

image

Caveats:

  1. Running headless chrome instances on my machine (Windows 10 pro) seemed to generate a lot of background chrome processes that were failing to close/exit. Then as I continued to use my robot to generate PDFs, the system would freeze after it had too many chrome instances (I think anyway). So I upgraded Chrome, after which nothing worked. This sent me down several rabbit holes. At the end of the day, I needed to update my chrome driver to match the updated chrome. After both Chrome and Chrome Driver were updated, I no longer had issues with chrome instances failing to exit. Prior to the updates, my system would freeze at about 20. I’ve tested the above with 60 separate instances with no issues.
  2. The approach above is I believe a low tech/ newb way to acheieve some of the functions of puppeteer.
    See GitHub - puppeteer/puppeteer: Headless Chrome Node.js API