Filenames with accents and special characters are screwed up when unzipping with ExtractFile

pere · July 27, 2023, 7:36am

Hi,

I’m uncompressing .ZIP files using the ExtractFile activites. The zipped file has a bunch of document files, usually .pdf. The file name are usually written in Catalan, with “open” and “closed” accents and special characters, like “ç” and so.

When uncompressed, the file names are changed to some other encoding, still displaying those accents and special characters but the wrong ones.

I’m attaching a couple of screenshots, with the wrong names (after uncompressed with th abovementioned activity) and the right ones (unzipped manually with 7Zip or Windows integrated utility).

Environment is Windows 11; system language is set to English (as it proved to be the only reliable method to get UiPath error messages in English…), but the locale/keyboard is set to Catalan and/or Spanish.

A very crappy issue not being able to properly deal with localization encodings as late as in 2023 and modifying/screwing up extracted files, which should be identical to the original ones, in my opinion.

Nguyen_Van_Luong1 · July 27, 2023, 7:45am

Hi,
You can try decompressing manually to see if there is an error, because the decompression robot keeps the same name, I tried extracting the zip file containing the files with the Japanese name but through Vietnamese there is no problem.
Regards,

raja.arslankhan · July 27, 2023, 7:50am

@pere

Use Custom .NET Code for Unzipping:

Instead of relying solely on the “ExtractFiles” activity, you can use custom .NET code to handle the unzipping process.
Create a custom workflow or invoke code using the “Invoke Code” activity to unzip the .ZIP files with specific encoding settings.
By using custom .NET code, you can have more control over the character encoding during the extraction process.

Rename Files After Unzipping:

If the encoding issue persists, you can extract the files with the “ExtractFiles” activity as usual.
After extraction, use the “Directory.GetFiles” activity to retrieve a list of all files in the destination folder.
Loop through each file and use the “Rename File” activity to rename the files to the correct names with the appropriate character encoding.

pere · July 27, 2023, 8:06am

I already said in my original message that, when unzipped manually with 7-Zip or Windows unzip system utility, filenames are properly preserved.

That it works for you with Japanese filenames “through Vietnamese” doesn’t guarantee that it has to work in my case.

Thanks.

pere · July 27, 2023, 8:11am

Thank you; I already do the renaming process later to some extent, through I need to have access to the original filenames since I’m keeping a part and, when it screws it up everything, I have no way to rename that to the original names (apart from being a very ugly, unsmart “solution”).

When it changes “ç”, accents or “·” to some random (to me) Scandinavian or far east European encoding character, to me there’s no possible way back. Well, to be scrict, it would probably be possible to make a “translation table”, checking one by one the characters that get screwed up, and in case the changes were consistent amongst different executions, files, environments, and so. But apart to me a huge amount of work, I think it’s not reliable.

pere · July 27, 2023, 8:23am

By the way,

I see there’s a dropdown menu in the activity properties for “Name encoding”, and it’s set to “System default”.

I’d like to try different set ups, but nevertheless, there’s a bunch of encodings there, but the list is incomplete and seems random somehow. For instance, theres a “Latin 3 (ISO)” and “Latin 9 (ISO)”, but I don’t find there a “Latin 15 (ISO)”, which should cover all the symbols for my language. And the naming is crappy, also, as it should be displayed and “ISO/IEC 8859-3” or so, which is the real name.

pere · July 27, 2023, 9:49am

Ok; forget about it. Plainly choosing UTF-8 as the encoding worked.
Thank you all for helping.

system · July 30, 2023, 9:49am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CompressFiles Activity filename issue - the first letters of files are deleted Forum	13	1914	March 7, 2022
Get content of zip file - Character encoding Studio string , encoding , invoke-code	3	655	February 13, 2024
ZipFile.ExtractToDirectory解压出来的中文文件名乱码 Studio activities , question , file_system	3	2436	April 4, 2020
Download zipped file from web, unzip and rename it after Help web	18	3810	December 14, 2019
ファイルを展開アクティビィティの文字化けについてフォーラム	7	985	December 20, 2023

Filenames with accents and special characters are screwed up when unzipping with ExtractFile

Related topics