|
1 | 1 | --- |
2 | 2 | title: "Unicode and Multibyte Character Set (MBCS) Support | Microsoft Docs" |
3 | 3 | ms.custom: "" |
4 | | -ms.date: "11/04/2016" |
| 4 | +ms.date: "1/09/2017" |
5 | 5 | ms.reviewer: "" |
6 | 6 | ms.suite: "" |
7 | 7 | ms.technology: ["cpp-windows"] |
8 | 8 | ms.tgt_pltfrm: "" |
9 | 9 | ms.topic: "reference" |
10 | 10 | dev_langs: ["C++"] |
11 | 11 | helpviewer_keywords: ["MFC [C++], character set support", "MBCS [C++], strings and MFC support", "strings [C++], MBCS support in MFC", "character sets [C++], multibyte", "Unicode [C++], MFC strings", "Unicode [C++], string objects", "strings [C++], Unicode", "strings [C++], character set support"] |
12 | | -ms.assetid: 44b3193b-c92d-40c5-9fa8-5774da303cce |
13 | | -caps.latest.revision: 17 |
14 | 12 | author: "mikeblome" |
15 | 13 | ms.author: "mblome" |
16 | 14 | manager: "ghogen" |
17 | 15 | ms.workload: ["cplusplus"] |
18 | 16 | --- |
19 | 17 | # Unicode and Multibyte Character Set (MBCS) Support |
20 | | -Some languages, for example, Japanese and Chinese, have large character sets. To support programming for these markets, the Microsoft Foundation Class Library (MFC) is enabled for two different approaches to handling large character sets: |
21 | | - |
22 | | -- [Unicode](#_core_mfc_support_for_unicode_strings) |
23 | | - |
24 | | -- [Multibyte Character Sets (MBCS)](#_core_mfc_support_for_mbcs_strings) |
25 | | - |
26 | | - You should use Unicode for all new development. |
27 | | - |
28 | | -## <a name="_core_mfc_support_for_unicode_strings"></a> MFC Support for Unicode Strings |
29 | | - The entire class library is conditionally enabled for Unicode characters and strings. In particular, class [CString](../atl-mfc-shared/reference/cstringt-class.md) is Unicode-enabled. |
30 | | - |
31 | | -||||| |
32 | | -|-|-|-|-| |
33 | | -|UAFXCW.LIB|UAFXCW.PDB|UAFXCWD.LIB|UAFXCWD.PDB| |
34 | | -|MFC*xx*U.LIB|MFC*xx*U.PDB|MFC*xx*U.DLL|MFC*xx*UD.LIB| |
35 | | -|MFC*xx*UD.PDB|MFC*xx*UD.DLL|MFCS*xx*U.LIB|MFCS*xx*U.PDB| |
36 | | -|MFCS*xx*UD.LIB|MFCS*xx*UD.PDB|MFCM*xx*U.LIB|MFCM*xx*U.PDB| |
37 | | -|MFCM*xx*U.DLL|MFCM*xx*UD.LIB|MFCM*xx*UD.PDB|MFCM*xx*UD.DLL| |
38 | | - |
39 | | - (*xx* represents the version number of the file; for example, '80' means version 8.0.) |
40 | | - |
41 | | - `CString` is based on the `TCHAR` data type. If the symbol `_UNICODE` is defined for a build of your program, `TCHAR` is defined as type `wchar_t`, a 16-bit character encoding type. Otherwise, `TCHAR` is defined as `char`, the normal 8-bit character encoding. Therefore, under Unicode, a `CString` is composed of 16-bit characters. Without Unicode, it is composed of characters of type `char`. |
42 | | - |
43 | | - To complete Unicode programming of your application, you must also: |
44 | | - |
45 | | -- Use the `_T` macro to conditionally code literal strings to be portable to Unicode. |
46 | | - |
47 | | -- When you pass strings, pay attention to whether function arguments require a length in characters or a length in bytes. The difference is important if you are using Unicode strings. |
48 | | - |
49 | | -- Use portable versions of the C run-time string-handling functions. |
50 | | - |
51 | | -- Use the following data types for characters and character pointers: |
52 | | - |
53 | | - - `TCHAR` Where you would use `char`. |
54 | | - |
55 | | - - `LPTSTR` Where you would use `char*`. |
56 | | - |
57 | | - - `LPCTSTR` Where you would use `const char*`. `CString` provides the operator `LPCTSTR` to convert between `CString` and `LPCTSTR`. |
58 | | - |
59 | | - `CString` also supplies Unicode-aware constructors, assignment operators, and comparison operators. |
60 | | - |
61 | | - For related information on Unicode programming, see [Unicode Topics](../mfc/unicode-in-mfc.md). The [Run-Time Library Reference](../c-runtime-library/c-run-time-library-reference.md) defines portable versions of all its string-handling functions. See the category [Internationalization](../c-runtime-library/internationalization.md). |
62 | | - |
63 | | -## <a name="_core_mfc_support_for_mbcs_strings"></a> MFC Support for MBCS Strings |
64 | | - |
65 | | - The class library is also enabled for multibyte character sets, but only for double-byte character sets (DBCS). |
66 | | - |
67 | | - In a multibyte character set, a character can be one or two bytes wide. If it is two bytes wide, its first byte is a special "lead byte" that is chosen from a particular range, depending on which code page is in use. Taken together, the lead and "trail bytes" specify a unique character encoding. |
68 | | - |
69 | | - If the symbol `_MBCS` is defined for a build of your program, type `TCHAR`, on which `CString` is based, maps to `char`. It is up to you to determine which bytes in a `CString` are lead bytes and which are trail bytes. The C run-time library supplies functions to help you determine this. |
70 | | - |
71 | | - Under DBCS, a given string can contain all single-byte ANSI characters, all double-byte characters, or a combination of the two. These possibilities require special care in parsing strings. This includes `CString` objects. |
72 | | - |
| 18 | + |
| 19 | +Some languages, for example, Japanese and Chinese, have large character sets. To support programming for these markets, the Microsoft Foundation Class Library (MFC) enables two different approaches to handling large character sets: |
| 20 | + |
| 21 | +- [Unicode](#mfc-support-for-unicode-strings), `wchar_t` based wide-characters and strings encoded as UTF-16. |
| 22 | + |
| 23 | +- [Multibyte Character Sets (MBCS)](#mfc-support-for-mbcs-strings), `char` based single or double-byte characters and strings encoded in a locale-specific character set. |
| 24 | + |
| 25 | +Microsoft has recommended the MFC Unicode libraries for all new development, and the MBCS libraries were deprecated in Visual Studio 2013 and Visual Studio 2015. This is no longer the case. The MBCS deprecation warnings have been removed in Visual Studio 2017. |
| 26 | + |
| 27 | +## MFC Support for Unicode Strings |
| 28 | + |
| 29 | +The entire MFC class library is conditionally enabled for Unicode characters and strings stored in wide characters as UTF-16. In particular, class [CString](../atl-mfc-shared/reference/cstringt-class.md) is Unicode-enabled. |
| 30 | + |
| 31 | +These library, debugger, and DLL files are used to support Unicode in MFC: |
| 32 | + |
| 33 | +||||| |
| 34 | +|-|-|-|-| |
| 35 | +|UAFXCW.LIB|UAFXCW.PDB|UAFXCWD.LIB|UAFXCWD.PDB| |
| 36 | +|MFC*version*U.LIB|MFC*version*U.PDB|MFC*version*U.DLL|MFC*version*UD.LIB| |
| 37 | +|MFC*version*UD.PDB|MFC*version*UD.DLL|MFCS*version*U.LIB|MFCS*version*U.PDB| |
| 38 | +|MFCS*version*UD.LIB|MFCS*version*UD.PDB|MFCM*version*U.LIB|MFCM*version*U.PDB| |
| 39 | +|MFCM*version*U.DLL|MFCM*version*UD.LIB|MFCM*version*UD.PDB|MFCM*version*UD.DLL| |
| 40 | + |
| 41 | +(*version* represents the version number of the file; for example, '140' means version 14.0.) |
| 42 | + |
| 43 | +`CString` is based on the `TCHAR` data type. If the symbol `_UNICODE` is defined for a build of your program, `TCHAR` is defined as type `wchar_t`, a 16-bit character encoding type. Otherwise, `TCHAR` is defined as `char`, the normal 8-bit character encoding. Therefore, under Unicode, a `CString` is composed of 16-bit characters. Without Unicode, it is composed of characters of type `char`. |
| 44 | + |
| 45 | +To complete Unicode programming of your application, you must also: |
| 46 | + |
| 47 | +- Use the `_T` macro to conditionally code literal strings to be portable to Unicode. |
| 48 | + |
| 49 | +- When you pass strings, pay attention to whether function arguments require a length in characters or a length in bytes. The difference is important if you are using Unicode strings. |
| 50 | + |
| 51 | +- Use portable versions of the C run-time string-handling functions. |
| 52 | + |
| 53 | +- Use the following data types for characters and character pointers: |
| 54 | + |
| 55 | + - Use `TCHAR` where you would use `char`. |
| 56 | + |
| 57 | + - Use `LPTSTR` where you would use `char*`. |
| 58 | + |
| 59 | + - Use `LPCTSTR` where you would use `const char*`. `CString` provides the operator `LPCTSTR` to convert between `CString` and `LPCTSTR`. |
| 60 | + |
| 61 | +`CString` also supplies Unicode-aware constructors, assignment operators, and comparison operators. |
| 62 | + |
| 63 | +The [Run-Time Library Reference](../c-runtime-library/c-run-time-library-reference.md) defines portable versions of all its string-handling functions. For more information, see the category [Internationalization](../c-runtime-library/internationalization.md). |
| 64 | + |
| 65 | +## MFC Support for MBCS Strings |
| 66 | + |
| 67 | +The class library is also enabled for multibyte character sets, but only for double-byte character sets (DBCS). |
| 68 | + |
| 69 | +In a multibyte character set, a character can be one or two bytes wide. If it is two bytes wide, its first byte is a special "lead byte" that is chosen from a particular range, depending on which code page is in use. Taken together, the lead and "trail bytes" specify a unique character encoding. |
| 70 | + |
| 71 | +If the symbol `_MBCS` is defined for a build of your program, type `TCHAR`, on which `CString` is based, maps to `char`. It is up to you to determine which bytes in a `CString` are lead bytes and which are trail bytes. The C run-time library supplies functions to help you determine this. |
| 72 | + |
| 73 | +Under DBCS, a given string can contain all single-byte ANSI characters, all double-byte characters, or a combination of the two. These possibilities require special care in parsing strings. This includes `CString` objects. |
| 74 | + |
73 | 75 | > [!NOTE] |
74 | | -> Unicode string serialization in MFC can read both Unicode and MBCS strings regardless of which version of the application that you are running. Your data files are portable between Unicode and MBCS versions of your program. |
75 | | - |
76 | | - `CString` member functions use special "generic text" versions of the C run-time functions they call, or they use Unicode-aware functions. Therefore, for example, if a `CString` function would typically call `strcmp`, it calls the corresponding generic-text function `_tcscmp` instead. Depending on how the symbols `_MBCS` and `_UNICODE` are defined, `_tcscmp` maps as follows: |
77 | | - |
78 | | -||| |
79 | | -|-|-| |
80 | | -|`_MBCS` defined|`_mbscmp`| |
81 | | -|`_UNICODE` defined|`wcscmp`| |
82 | | -|Neither symbol defined|`strcmp`| |
83 | | - |
| 76 | +> Unicode string serialization in MFC can read both Unicode and MBCS strings regardless of which version of the application that you are running. Your data files are portable between Unicode and MBCS versions of your program. |
| 77 | +
|
| 78 | +`CString` member functions use special "generic text" versions of the C run-time functions they call, or they use Unicode-aware functions. Therefore, for example, if a `CString` function would typically call `strcmp`, it calls the corresponding generic-text function `_tcscmp` instead. Depending on how the symbols `_MBCS` and `_UNICODE` are defined, `_tcscmp` maps as follows: |
| 79 | + |
| 80 | +||| |
| 81 | +|-|-| |
| 82 | +|`_MBCS` defined|`_mbscmp`| |
| 83 | +|`_UNICODE` defined|`wcscmp`| |
| 84 | +|Neither symbol defined|`strcmp`| |
| 85 | + |
84 | 86 | > [!NOTE] |
85 | | -> The symbols `_MBCS` and `_UNICODE` are mutually exclusive. |
86 | | - |
87 | | - Generic-text function mappings for all of the run-time string-handling routines are discussed in [C Run-Time Library Reference](../c-runtime-library/c-run-time-library-reference.md). In particular, see [Internationalization](../c-runtime-library/internationalization.md). |
88 | | - |
89 | | - Similarly, `CString` methods are implemented by using "generic" data type mappings. To enable both MBCS and Unicode, MFC uses `TCHAR` for `char`, `LPTSTR` for `char*`, and `LPCTSTR` for `const char*`. These ensure the correct mappings for either MBCS or Unicode. |
90 | | - |
91 | | -## See Also |
92 | | - [Strings (ATL/MFC)](../atl-mfc-shared/strings-atl-mfc.md) |
93 | | - [String Manipulation](../c-runtime-library/string-manipulation-crt.md) |
| 87 | +> The symbols `_MBCS` and `_UNICODE` are mutually exclusive. |
| 88 | +
|
| 89 | +Generic-text function mappings for all of the run-time string-handling routines are discussed in [C Run-Time Library Reference](../c-runtime-library/c-run-time-library-reference.md). For a list, see [Internationalization](../c-runtime-library/internationalization.md). |
| 90 | + |
| 91 | +Similarly, `CString` methods are implemented by using generic data type mappings. To enable both MBCS and Unicode, MFC uses `TCHAR` for `char` or `wchar_t`, `LPTSTR` for `char*` or `wchar_t*`, and `LPCTSTR` for `const char*` or `const wchar_t*`. These ensure the correct mappings for either MBCS or Unicode. |
| 92 | + |
| 93 | +## See Also |
94 | 94 |
|
| 95 | +[Strings (ATL/MFC)](../atl-mfc-shared/strings-atl-mfc.md) |
| 96 | +[String Manipulation](../c-runtime-library/string-manipulation-crt.md) |
0 commit comments