String contains invalid or unsupported UTF8 codepoints Bad UTF8 hex sequence, how to remove invalid chars?

Greetings everyone, so I’m facing a problem with an automation that retrives some data from SAP and uploads to a database. As I understand, some string have an invalid character that’s causing this error but I can’t find it because the amount of data is very massive (+2000 lines, +100 collumns).
There is a way in which I can remove or replace those characters? I’m trying using regex without sucess. Thanks team.


If you’re handy with Python, you can try:

    # Encode the string using utf-8 and ignore any errors
    encoded = string.encode('utf-8', 'ignore')

    # Decode the encoded string back to utf-8
    decoded = encoded.decode('utf-8')

    return decoded

There may be some data loss or alteration. Make sure to assess the impact and consider the specific requirements of your use case before applying this approach.



You can give a try with this

Regex.Replace(input, @"[^\u0000-\uFFFF]", "")