How to convert Word Doc Text into HTML format .txt file?
I have some text in the word document with some formatting in there. In order to preserve that format, I need to convert the word doc output into HTML code and then read the HTML code and pass it on to the send outlook mail activity. How
Sub Main()
' File paths
Dim wordFilePath As String = "C:\Path\To\Your\Document.docx"
Dim htmlFilePath As String = "C:\Path\To\Your\Document.html"
' Step 1: Convert Word to HTML
Dim wordApp As New Application()
Dim wordDoc As Document = wordApp.Documents.Open(wordFilePath)
wordDoc.SaveAs2(FileName:=htmlFilePath, FileFormat:=WdSaveFormat.wdFormatFilteredHTML)
wordDoc.Close()
wordApp.Quit()
' Step 2: Read HTML content
Dim htmlContent As String = File.ReadAllText(htmlFilePath)
' Step 3: Send Email
Dim outlookApp As New Application()
Dim mail As MailItem = outlookApp.CreateItem(OlItemType.olMailItem)
mail.Subject = "Formatted Email from Word Document"
mail.HTMLBody = htmlContent
mail.To = "recipient@example.com"
mail.Send()
End Sub
To convert a Word document with formatting into HTML for use in the body of an email, you can save the Word document as an HTML file and then read its content as a string. First, use a Word processing tool (like Word Interop in UiPath or a script) to save the document in filtered HTML format. Then, read the saved HTML file using the Read Text File activity in UiPath, storing the content in a variable. Finally, use the Send Outlook Mail Message activity, setting IsBodyHtml to True and passing the HTML content as the email body. This process ensures that the formatting from the Word document is preserved in the email.
In the Save as type dropdown, select Web Page (.htm, .html).
This option generates an HTML file along with associated resources (like images in a folder).
Use “Web Page, Filtered” Option:
If you need cleaner HTML without additional Word-specific tags, choose Web Page, Filtered (.htm, .html) under Save As.
This strips out much of the extra formatting and leaves a simplified HTML structure.
Validate the Output:
Open the saved .html file in a browser or text editor to verify that the Word content has been converted into proper HTML.
Use Tools or Code for Conversion:
If Word’s inbuilt export is insufficient, use conversion tools or write a script. Python libraries like python-docx combined with html modules can extract and convert content programmatically.
Python
from docx import Document
# Load the Word document
doc = Document("example.docx")
html_content = ""
# Loop through paragraphs and convert to HTML
for para in doc.paragraphs:
html_content += f"<p>{para.text}</p>"
# Save as an HTML file
with open("example.html", "w", encoding="utf-8") as file:
file.write(html_content)