I’ve seen a lot of questions on this over time, and today I needed to figure out how to do it properly. It’s fairly simple now that I have it worked out. Here is how to read a MSG file and get the body HTML including the embedded images.
Install the MsgReader v5.7.2 package:
There’s a little workaround we have to do to make it work, so first create a variable with datatype System.Text.EncodingProvider:
Then you will add an Assign activity:
Now add an Invoke Code with these arguments:
Here is the code to put inside the Invoke Code:
System.Text.Encoding.RegisterProvider(MSGEncodingProvider)
Try
Using msg As MsgReader.Outlook.Storage.Message = New MsgReader.Outlook.Storage.Message(MSGFile)
' Display basic information about the email
Console.WriteLine("Subject: " & msg.Subject)
BodyHTML = msg.BodyHtml
If msg.Attachments.Count > 0 Then
For Each attachment As MsgReader.Outlook.Storage.Attachment In msg.Attachments
Console.WriteLine("Attachment: " & attachment.FileName)
Dim imageExtensions As String() = {".jpg", ".jpeg", ".png", ".gif"}
Dim fileExtension As String = System.IO.Path.GetExtension(attachment.FileName).ToLower()
If imageExtensions.Contains(fileExtension) Then
' Convert attachment data to a base64 string
Dim base64Data As String = Convert.ToBase64String(attachment.Data)
' Create the data URL assuming a common image type for demo purposes
Dim dataUrl As String = "data:image/" & fileExtension.TrimStart("."c) & ";base64," & base64Data
'Console.WriteLine("dataUrl [" & dataUrl & "]")
' Replace references to filename in BodyHTML, assuming it's embedded using the attachment filename
'BodyHTML = BodyHTML.Replace("cid:" & attachment.FileName, dataUrl)
BodyHTML = System.Text.RegularExpressions.Regex.Replace(BodyHTML,"(?<=src="").*?(?="")",dataUrl)
End If
Next
End If
End Using
Catch ex As Exception
Console.WriteLine(ex)
End Try
Now just write it out to a text file, and you can open it in a browser:
You can also of course, extract other properties from the email message such as msg.BodyText, msg.Sender, msg.SentOn, etc. Just add them as out arguments and assign them in the Invoke Code.
For the attachments to work properly, you may have to adjust the RegEx expression. This is what the img tag looked like in my test MSG file:
src="cid:image002.png@01DA97B2.80FC6B50">
So I wrote my RegEx expression to replace that with the generated dataUrl variable:
BodyHTML = System.Text.RegularExpressions.Regex.Replace(BodyHTML,"(?<=src="").*?(?="")",dataUrl)
Depending upon the exact format of the img tag in the MSG file you’re working with, you may need to adjust this expression. There is also a simpler version of this replacement commented out in the code:
BodyHTML = BodyHTML.Replace("cid:" & attachment.FileName, dataUrl)
This didn’t work for me because my original BodyHTML had the @01DA97B2.80FC6B50 after the attachment filename, so I wrote the RegEx.Replace to just replace everything between the double quotes in src=“cid:image002.png@01DA97B2.80FC6B50”