How to convert Word Doc Text into HTML format .txt file?

How to convert Word Doc Text into HTML format .txt file?

I have some text in the word document with some formatting in there. In order to preserve that format, I need to convert the word doc output into HTML code and then read the HTML code and pass it on to the send outlook mail activity. How

Imports Microsoft.Office.Interop.Word
Imports System.IO
Imports Microsoft.Office.Interop.Outlook

Module WordToHtmlEmail

Sub Main()
    ' File paths
    Dim wordFilePath As String = "C:\Path\To\Your\Document.docx"
    Dim htmlFilePath As String = "C:\Path\To\Your\Document.html"

    ' Step 1: Convert Word to HTML
    Dim wordApp As New Application()
    Dim wordDoc As Document = wordApp.Documents.Open(wordFilePath)
    wordDoc.SaveAs2(FileName:=htmlFilePath, FileFormat:=WdSaveFormat.wdFormatFilteredHTML)
    wordDoc.Close()
    wordApp.Quit()

    ' Step 2: Read HTML content
    Dim htmlContent As String = File.ReadAllText(htmlFilePath)

    ' Step 3: Send Email
    Dim outlookApp As New Application()
    Dim mail As MailItem = outlookApp.CreateItem(OlItemType.olMailItem)
    mail.Subject = "Formatted Email from Word Document"
    mail.HTMLBody = htmlContent
    mail.To = "recipient@example.com"
    mail.Send()
End Sub

End Module

To convert a Word document with formatting into HTML for use in the body of an email, you can save the Word document as an HTML file and then read its content as a string. First, use a Word processing tool (like Word Interop in UiPath or a script) to save the document in filtered HTML format. Then, read the saved HTML file using the Read Text File activity in UiPath, storing the content in a variable. Finally, use the Send Outlook Mail Message activity, setting IsBodyHtml to True and passing the HTML content as the email body. This process ensures that the formatting from the Word document is preserved in the email.

@SOURAV_AGARWAL

If i save word in html…all it does is it changes the file name to .html. the actual content in word is not getting converted into html format

@SOURAV_AGARWAL

Is this anotherway of doing it? Or are they both connected?

Also, should I put this in the notepad and use “invoke VBA” or should I put everything in “invoke code” activity?

Steps to Convert Word Content into Proper HTML:

  1. Use “Save As” Feature in Word:
  • Open the Word document.
  • Click on File > Save As.
  • Choose the location to save the file.
  • In the Save as type dropdown, select Web Page (.htm, .html).
  • This option generates an HTML file along with associated resources (like images in a folder).
  1. Use “Web Page, Filtered” Option:
  • If you need cleaner HTML without additional Word-specific tags, choose Web Page, Filtered (.htm, .html) under Save As.
  • This strips out much of the extra formatting and leaves a simplified HTML structure.
  1. Validate the Output:
  • Open the saved .html file in a browser or text editor to verify that the Word content has been converted into proper HTML.
  1. Use Tools or Code for Conversion:
  • If Word’s inbuilt export is insufficient, use conversion tools or write a script. Python libraries like python-docx combined with html modules can extract and convert content programmatically.
    Python
from docx import Document

# Load the Word document
doc = Document("example.docx")
html_content = ""

# Loop through paragraphs and convert to HTML
for para in doc.paragraphs:
    html_content += f"<p>{para.text}</p>"

# Save as an HTML file
with open("example.html", "w", encoding="utf-8") as file:
    file.write(html_content)