String contains invalid or unsupported UTF8 codepoints Bad UTF8 hex sequence, how to remove invalid chars?

Mateus_Snk · June 7, 2023, 7:24pm

Greetings everyone, so I’m facing a problem with an automation that retrives some data from SAP and uploads to a database. As I understand, some string have an invalid character that’s causing this error but I can’t find it because the amount of data is very massive (+2000 lines, +100 collumns).
There is a way in which I can remove or replace those characters? I’m trying using regex without sucess. Thanks team.

argin.lerit · June 7, 2023, 10:47pm

@Mateus_Snk

If you’re handy with Python, you can try:

    # Encode the string using utf-8 and ignore any errors
    encoded = string.encode('utf-8', 'ignore')

    # Decode the encoded string back to utf-8
    decoded = encoded.decode('utf-8')

    return decoded

There may be some data loss or alteration. Make sure to assess the impact and consider the specific requirements of your use case before applying this approach.

Thanks!

Anil_G · June 8, 2023, 5:51am

@Mateus_Snk

You can give a try with this

Regex.Replace(input, @"[^\u0000-\uFFFF]", "")

cheers

Topic		Replies	Views
Redshift error: String contains invalid or unsupported UTF8 codepoints Bad UTF8 hex sequence Studio studio , question , settings	4	2456	June 8, 2023
Removing Unicode characters Activities activities , question , other	8	36	July 19, 2025
Remove strange/special characters from string Activities excel , activities , question	1	1186	August 24, 2022
How replace a special characters in string Community question	5	4842	December 2, 2021
How remove “\u” Studio studio	4	1181	January 4, 2022

String contains invalid or unsupported UTF8 codepoints Bad UTF8 hex sequence, how to remove invalid chars?

Related topics