I’m uncompressing .ZIP files using the ExtractFile activites. The zipped file has a bunch of document files, usually .pdf. The file name are usually written in Catalan, with “open” and “closed” accents and special characters, like “ç” and so.
When uncompressed, the file names are changed to some other encoding, still displaying those accents and special characters but the wrong ones.
I’m attaching a couple of screenshots, with the wrong names (after uncompressed with th abovementioned activity) and the right ones (unzipped manually with 7Zip or Windows integrated utility).
Environment is Windows 11; system language is set to English (as it proved to be the only reliable method to get UiPath error messages in English…), but the locale/keyboard is set to Catalan and/or Spanish.
A very crappy issue not being able to properly deal with localization encodings as late as in 2023 and modifying/screwing up extracted files, which should be identical to the original ones, in my opinion.
Hi,
You can try decompressing manually to see if there is an error, because the decompression robot keeps the same name, I tried extracting the zip file containing the files with the Japanese name but through Vietnamese there is no problem.
Regards,
Thank you; I already do the renaming process later to some extent, through I need to have access to the original filenames since I’m keeping a part and, when it screws it up everything, I have no way to rename that to the original names (apart from being a very ugly, unsmart “solution”).
When it changes “ç”, accents or “·” to some random (to me) Scandinavian or far east European encoding character, to me there’s no possible way back. Well, to be scrict, it would probably be possible to make a “translation table”, checking one by one the characters that get screwed up, and in case the changes were consistent amongst different executions, files, environments, and so. But apart to me a huge amount of work, I think it’s not reliable.
I see there’s a dropdown menu in the activity properties for “Name encoding”, and it’s set to “System default”.
I’d like to try different set ups, but nevertheless, there’s a bunch of encodings there, but the list is incomplete and seems random somehow. For instance, theres a “Latin 3 (ISO)” and “Latin 9 (ISO)”, but I don’t find there a “Latin 15 (ISO)”, which should cover all the symbols for my language. And the naming is crappy, also, as it should be displayed and “ISO/IEC 8859-3” or so, which is the real name.