Can't parse XML due to hexadecimal 0x1E character – Tried cleaning with Regex but still fails

Hello,

I’m working with an XML file that contains the entire XML as a single line of code.

My goal is to:

  1. Clean it (remove invalid characters),
  2. Pretty-format it on multiple lines (indentation),
  3. And later extract specific nodes using LINQ (e.g., packagingUnit).

However, I’m stuck at step 1 joy: :joy: :slightly_smiling_face:

Problem:

When I try to load the content, I get this error:

  • Assign: hexadecimal value 0x1E is an invalid character.

I tried cleaning it with:
XmlContent = System.Text.RegularExpressions.Regex.Replace(XmlContent, “[^\x09\x0A\x0D\x20-\xFF]”, “”)

But it still throws the same error. I suspect something is wrong either with the Regex or encoding.

What I tried:

  • Reading the file using Read Text File
  • Storing it in a String variable (XmlContent)
  • Cleaning with regex (as above)
  • Parsing with XDocument.Parse(XmlContent)
  • Also tried loading as XmlDocument.LoadXml(XmlContent) → same issue

What I need help with:

  • How can I safely remove illegal XML characters from the file?
  • Is there a more reliable method than Regex?
  • Do I need to change the file’s encoding when reading?
  • Any way to pretty-print the resulting valid XML after cleaning?

I appreciate any help. Thank you!

@Lungu_Alexandru

Can you try this method

https://stackoverflow.com/questions/8331119/escape-invalid-xml-characters-in-c-sharp

Cheers