Filenames with accents and special characters are screwed up when unzipping with ExtractFile

Hi,

I’m uncompressing .ZIP files using the ExtractFile activites. The zipped file has a bunch of document files, usually .pdf. The file name are usually written in Catalan, with “open” and “closed” accents and special characters, like “ç” and so.

When uncompressed, the file names are changed to some other encoding, still displaying those accents and special characters but the wrong ones.

I’m attaching a couple of screenshots, with the wrong names (after uncompressed with th abovementioned activity) and the right ones (unzipped manually with 7Zip or Windows integrated utility).

Environment is Windows 11; system language is set to English (as it proved to be the only reliable method to get UiPath error messages in English…), but the locale/keyboard is set to Catalan and/or Spanish.

A very crappy issue not being able to properly deal with localization encodings as late as in 2023 and modifying/screwing up extracted files, which should be identical to the original ones, in my opinion.


Hi,
You can try decompressing manually to see if there is an error, because the decompression robot keeps the same name, I tried extracting the zip file containing the files with the Japanese name but through Vietnamese there is no problem.
Regards,

@pere

  1. Use Custom .NET Code for Unzipping:
  • Instead of relying solely on the “ExtractFiles” activity, you can use custom .NET code to handle the unzipping process.
  • Create a custom workflow or invoke code using the “Invoke Code” activity to unzip the .ZIP files with specific encoding settings.
  • By using custom .NET code, you can have more control over the character encoding during the extraction process.
  1. Rename Files After Unzipping:
  • If the encoding issue persists, you can extract the files with the “ExtractFiles” activity as usual.
  • After extraction, use the “Directory.GetFiles” activity to retrieve a list of all files in the destination folder.
  • Loop through each file and use the “Rename File” activity to rename the files to the correct names with the appropriate character encoding.

I already said in my original message that, when unzipped manually with 7-Zip or Windows unzip system utility, filenames are properly preserved.

That it works for you with Japanese filenames “through Vietnamese” doesn’t guarantee that it has to work in my case.

Thanks.

Thank you; I already do the renaming process later to some extent, through I need to have access to the original filenames since I’m keeping a part and, when it screws it up everything, I have no way to rename that to the original names (apart from being a very ugly, unsmart “solution”).

When it changes “ç”, accents or “·” to some random (to me) Scandinavian or far east European encoding character, to me there’s no possible way back. Well, to be scrict, it would probably be possible to make a “translation table”, checking one by one the characters that get screwed up, and in case the changes were consistent amongst different executions, files, environments, and so. But apart to me a huge amount of work, I think it’s not reliable.

By the way,

I see there’s a dropdown menu in the activity properties for “Name encoding”, and it’s set to “System default”.

I’d like to try different set ups, but nevertheless, there’s a bunch of encodings there, but the list is incomplete and seems random somehow. For instance, theres a “Latin 3 (ISO)” and “Latin 9 (ISO)”, but I don’t find there a “Latin 15 (ISO)”, which should cover all the symbols for my language. And the naming is crappy, also, as it should be displayed and “ISO/IEC 8859-3” or so, which is the real name.

Ok; forget about it. Plainly choosing UTF-8 as the encoding worked.
Thank you all for helping.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.