How to read a MSG file (email) including embedded images

I’ve seen a lot of questions on this over time, and today I needed to figure out how to do it properly. It’s fairly simple now that I have it worked out. Here is how to read a MSG file and get the body HTML including the embedded images.

Install the MsgReader v5.7.2 package:
image

There’s a little workaround we have to do to make it work, so first create a variable with datatype System.Text.EncodingProvider:
image

Then you will add an Assign activity:

Now add an Invoke Code with these arguments:

Here is the code to put inside the Invoke Code:

System.Text.Encoding.RegisterProvider(MSGEncodingProvider)
Try
Using msg As MsgReader.Outlook.Storage.Message = New MsgReader.Outlook.Storage.Message(MSGFile)
            ' Display basic information about the email
            Console.WriteLine("Subject: " & msg.Subject)
            BodyHTML = msg.BodyHtml

			If msg.Attachments.Count > 0 Then
            For Each attachment As MsgReader.Outlook.Storage.Attachment In msg.Attachments
                Console.WriteLine("Attachment: " & attachment.FileName)
				Dim imageExtensions As String() = {".jpg", ".jpeg", ".png", ".gif"}
                Dim fileExtension As String = System.IO.Path.GetExtension(attachment.FileName).ToLower()

            	If imageExtensions.Contains(fileExtension) Then
                ' Convert attachment data to a base64 string
              	Dim base64Data As String = Convert.ToBase64String(attachment.Data)

                ' Create the data URL assuming a common image type for demo purposes
               	Dim dataUrl As String = "data:image/" & fileExtension.TrimStart("."c) & ";base64," & base64Data
				'Console.WriteLine("dataUrl [" & dataUrl & "]")
                ' Replace references to filename in BodyHTML, assuming it's embedded using the attachment filename
                'BodyHTML = BodyHTML.Replace("cid:" & attachment.FileName, dataUrl)
				BodyHTML = System.Text.RegularExpressions.Regex.Replace(BodyHTML,"(?<=src="").*?(?="")",dataUrl)
            End If
            Next
        End If
    End Using
Catch ex As Exception
	Console.WriteLine(ex)
End Try

Now just write it out to a text file, and you can open it in a browser:

You can also of course, extract other properties from the email message such as msg.BodyText, msg.Sender, msg.SentOn, etc. Just add them as out arguments and assign them in the Invoke Code.

For the attachments to work properly, you may have to adjust the RegEx expression. This is what the img tag looked like in my test MSG file:

src="cid:image002.png@01DA97B2.80FC6B50">

So I wrote my RegEx expression to replace that with the generated dataUrl variable:

BodyHTML = System.Text.RegularExpressions.Regex.Replace(BodyHTML,"(?<=src="").*?(?="")",dataUrl)

Depending upon the exact format of the img tag in the MSG file you’re working with, you may need to adjust this expression. There is also a simpler version of this replacement commented out in the code:

BodyHTML = BodyHTML.Replace("cid:" & attachment.FileName, dataUrl)

This didn’t work for me because my original BodyHTML had the @01DA97B2.80FC6B50 after the attachment filename, so I wrote the RegEx.Replace to just replace everything between the double quotes in src=“cid:image002.png@01DA97B2.80FC6B50”