Convert/save email (O365) to PDF including inline images

I had previously done a writeup on how to do this with .eml files, but here is how to do it with an Office365Message object.

Start by copying Email.BodyAsHTML to a string variable:

image

This next section finds all the png, jpg, and jpeg inline images and replaces them with the Base64. The original src in the BodyHTML will look like this:

src="cid:image002.png@01DA97B2.80FC6B50">

So let’s loop through the linked resources:

Then we will make sure only to process the png, jpg, or jpeg resources:

Now we will generate the Base64 and put it together with what we need in order to replace cid:filename@contentID:

image
The expression on the right is:
"data:image/" & currentLinkedResource.ContentType.Name.TrimStart("."c) & ";base64," & Convert.ToBase64String((New System.IO.BinaryReader(currentLinkedResource.ContentStream)).ReadBytes(CInt(currentLinkedResource.ContentStream.Length)))

Then we do our replacement into BodyHTML:
image
The expression on the right is:
System.Text.RegularExpressions.Regex.Replace(BodyHTML,"(?=cid:" +currentLinkedResource.ContentType.Name + ").*?(?="")",Base64String)

And that’s it for replacing the inline images with Base64. The whole thing looks like this:

To create the PDF, set up a path and filename for a temporary HTML file:
image
The path and filename is up to you. For my purposes I just generate a unique filename using datetime:
Path.Combine(Environment.CurrentDirectory,"Temp Email " + Now.ToString("MMddyyyyHHmmss") + ".html")

Then write the text (HTML) file:

Then use the Chrome “headless” method to generate the PDF:
image

I have used the ChromePath variable to tell it the location of the executable because I’ve actually created this as a custom activity and it’s an argument passed in. Substitute this for whatever your Chrome path is to chrome.exe.

For the Start Process activity’s arguments:
"--headless --disable-gpu --no-pdf-header-footer --print-to-pdf=""" & OutputPDFFile & """ " & """" & HTMLFile & """"

If you want the header and footer you can just leave out --no-pdf-header-footer.

Because of the way the Chrome conversion works, the internal title of the PDF ends up being the name of the HTML file and I don’t like that. So I use the PDfsharp-GDI library to change it:

The code is:

Dim document As PdfSharp.Pdf.PdfDocument = PdfSharp.Pdf.IO.PdfReader.Open(OutputPDFFile, PdfSharp.Pdf.IO.PdfDocumentOpenMode.Modify)
document.Info.Title = System.IO.Path.GetFileName(OutputPDFFile)
document.Save(OutputPDFFile)

Then to be tidy I delete the temporary HTML file:
image

And that’s it, you’ve now converted an Office365Message to PDF with inline images maintained.

1 Like