0

I'm developing a multi-language piece of software (first time working on anything other than English).

I've made code that reads in multiple localization files, the user then selects their language, and that localization file is used.

This all works fine and dandy, but when I try to display symbols from foreign languages (like Korean) it does not show the correct symbols.

Is there something special I need to do to store Chinese, Korean, Japanese, etc into strings? One of my Korean Localization files looks like this....

[Labels]
Username=사용자 이름
Password=암호

So in my code I have a function that gets the designated string like this...

const std::string& UsernameLabel = GetLocalizationString("Korean", "Labels", "Username");
const std::string& PasswordLabel = GetLocalizationString("Korean", "Labels", "Password");
9
  • 1
    How do you display them? By printing into standard output? Commented Jan 31, 2018 at 7:59
  • 4
    "Foreign language" seems not the real issue. The real issue is probably the handling of non-ASCII characters. Also, your foreign language is someone's native language. Commented Jan 31, 2018 at 8:00
  • May want to look into std::wstring, but I'm not sure what encoding you are working with. Commented Jan 31, 2018 at 8:07
  • How are these strings encoded? Maybe UTF-8 or ISO-2022-KR? And what is the encoding of the terminal? What is shown instead of the correct symbols? Commented Jan 31, 2018 at 8:08
  • 1
    UTF8 is probably what you are looking for. stackoverflow.com/questions/3011082/… Commented Jan 31, 2018 at 8:09

2 Answers 2

5

The root of the issue is std::string itself as it deals with chars (that is equal to 1 byte in most cases). As soon as you plan to develop multi-language software, you have to do one of the following:

  • Use std::wstring as it deals with "wide chars" (usually 2 bytes on Windows). Easy to do, covers most cases.
  • Step away from standard string classes and use UTF-8 (or UTF-32 etc.) encoding to represent UI info. Thus, it means working with byte buffers, not strings because some symbols are encoded with multiple bytes, some bytes are not symbols at all (like emoji modifiers for skin color, gender etc.). The most correct approach, may be time-consuming.

Update: also, you may find this discussion useful: std::wstring VS std::string

Sign up to request clarification or add additional context in comments.

5 Comments

Interesting side note, according to the C++ standard, "The sizeof operator yields the number of bytes" and "sizeof(char), sizeof(signed char) and sizeof(unsigned char) are 1" (Quoting N4700, [expr.sizeof]) so I'm pretty sure that means no matter how many bits are in a char, it is still one byte.
@user4581301 As far as the C++ standard is concerned, a byte and a char are pretty much the same thing.
"wide chars" have some of the same problems as UTF-8 - i.e. some symbols are represented by more than one 16-bit "wide chars"; so if you want to do it right you gain very little by doing that.
@HansOlsson, as long as you work with UTF-8 as a byte stream/buffer, "there is no char". Just let system-level API to render that bytes on the screen.
@YurySchkatula agreed for UTF-8, my point is that this also holds for "wide chars", and you cannot just drop the right-most "wide character" if the string is too long or assume that 10 wide characters are twice as long as 5 wide characters (on the screen).
1

Wide characters should suit your situation,Such as unicode,I am from above country, hope can help you.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.