studioX for non-unicode windows application

mountie · November 26, 2023, 2:28pm

unicode is default encoding at windows.
but windows support additional encodings for non-unicode applications.
default alternative locale is english.
but local sales windows version of Korea, Japan and China. default alternative locale is set to each country’s historical encodings.

when I try to use “extract table data” activity for non-unicode app,
studioX or the attended Bot does not follow the windows’s alternative locale setting.
the extracted data have broken character. only alphanumeric is visible.

other activity is working fine such as get text or anchors.

I think this is possibly bug.

Nguyen_Van_Luong1 · November 27, 2023, 12:53am

Extracting languages with special characters such as Chinese, Japanese, Korean…
In Vietnam I also have unicode for Vietnamese, however this will have an impact, usually I will turn off unicode to ensure accuracy.
Can you please share photos of your activities?
regards,

mountie · November 27, 2023, 12:17pm

currently I removed the table data extracting activity.
the problem was because of dual encodings environment.
before unicode was accepted internationally, each countries developed their own encoding standards. for example, EUC-KR, Shift-JIS, Big5 for traditional chinese.

I was able to support unicode and non-unicode application both by setting followings

setup
스크린샷 2023-11-27 21.10.12

time and landuage
스크린샷 2023-11-27 21.10.55

choose language
스크린샷 2023-11-27 21.11.25

adding language pack
스크린샷 2023-11-27 21.12.14

choose date, time and local language
스크린샷 2023-11-27 21.13.12

choose additional date, time and language
스크린샷 2023-11-27 21.13.58

choose country or locale
스크린샷 2023-11-27 21.14.21

choose admin option

change system locale
for non-unicode program

it was default english for non-unicode application.
with this default configuration, non-unicode CJK programs will result broken character.

Luiza_Surdu-Bob · November 27, 2023, 5:30pm

Hi @mountie ,

The Extract Table Data activity supports only Unicode characters.
This is because when extracting table data, we save the extracted data in an XML format. As XML documents consist of characters from the Unicode repertoire, we exclude the characters that are not valid in XML (Valid characters in XML - Wikipedia).

The Get Text activity needs no such filtering and therefore works for non-Unicode applications.

Best regards,
Luiza

mountie · November 28, 2023, 1:26am

using XML is your choice.
I think UiPath can fix it.
in Here(at least Korea), too much non-unicode applications are still running well where it can be automated only by RPA

I will test with other RPA solutions aswell.

Luiza_Surdu-Bob · November 28, 2023, 9:47am

Thanks for your feedback @mountie . I’ll take your suggestion into the backlog for assessment and prioritization.

Thank you,
Luiza

Topic		Replies	Views
Which scrapping methods to get non-unicode text? Studio studio , question , properties_panel	2	252	January 6, 2024
23.10 UiAutomation Table Extraction not able recognize chinese characters Studio uiautomation , bug	0	125	April 15, 2024
Data Scraping selector and Scraping Result format Problem Help selector , activities , data_scraping , question	9	1224	December 3, 2019
Japanese written data are turned to garbage characters. (日本語であるCSVデータがoutput data tableの後、文字化けする) Help	2	1516	October 29, 2018
UNICODE characters from csv to webapp not working Help csv , activities , web , question	10	1601	November 22, 2019

studioX for non-unicode windows application

Related topics