Hello,
I’m working with an XML file that contains the entire XML as a single line of code.
My goal is to:
- Clean it (remove invalid characters),
- Pretty-format it on multiple lines (indentation),
- And later extract specific nodes using LINQ (e.g.,
packagingUnit
).
However, I’m stuck at step 1 joy:
Problem:
When I try to load the content, I get this error:
- Assign: hexadecimal value 0x1E is an invalid character.
I tried cleaning it with:
XmlContent = System.Text.RegularExpressions.Regex.Replace(XmlContent, “[^\x09\x0A\x0D\x20-\xFF]”, “”)
But it still throws the same error. I suspect something is wrong either with the Regex or encoding.
What I tried:
- Reading the file using
Read Text File
- Storing it in a
String
variable (XmlContent) - Cleaning with regex (as above)
- Parsing with
XDocument.Parse(XmlContent)
- Also tried loading as
XmlDocument.LoadXml(XmlContent)
→ same issue
What I need help with:
- How can I safely remove illegal XML characters from the file?
- Is there a more reliable method than Regex?
- Do I need to change the file’s encoding when reading?
- Any way to pretty-print the resulting valid XML after cleaning?
I appreciate any help. Thank you!