STL String and Unicode
2009-02-22 19:56
288 查看
Get it from: http://msdn.microsoft.com/zh-cn/magazine/cc188714(en-us).aspx
In one word, in order to use STL string to be adaptive with Unicode, the best way is to define a tstring as follows:
Then we can use tstring everywhere to store _T("XX") thing. Detail, please look into below Q&A. I am so happy the solution is exactly the same as mine before seeing this article.
Q: I use the Standard Template Library (STL) std::string class very often in my C++ programs, but I have a problem when it comes to Unicode. When using regular C-style strings I can use TCHAR and the _T macro to write code that compiles for either Unicode or ASCII, but I always find it difficult to get this ASCII/Unicode combination working with the STL string class. Do you have any suggestions?
A: Sure. It's easy, once you know how TCHAR and _T work. The basic idea is that TCHAR is either char or wchar_t, depending on the value of _UNICODE:
复制代码
When you choose Unicode as the character set in your project settings, the compiler compiles with _UNICODE defined. If you select MBCS (Multi-Byte Character Sets), the compiler builds without _UNICODE. Everything hinges on the value of _UNICODE. Similarly, every Windows® API function that uses char pointers has an A (ASCII) and a W (Wide/Unicode) version, with the real version defined to one of these, based on the value of _UNICODE:
复制代码
Likewise, there's _tprintf and _tscanf for printf and scanf. All the 't' versions use TCHARs instead of chars. So how can you apply all this to std::string? Easy. STL already has a wstring class that uses wide characters (defined in the file xstring). Both string and wstring are typedef-ed as template classes using basic_string, which lets you create a string class using any character type. Here's how STL defines string and wstring:
复制代码
The templates are parameterized by the underlying character type (char or wchar_t), so all you need for a TCHAR version is to mimic the definitions using TCHAR:
复制代码
Now you have a tstring that's based on TCHAR—that is, either char or wchar_t, depending on the value of _UNICODE. I'm showing you this to point out how STL uses basic_string to implement strings based on any underlying character type. Defining a new typedef isn't the most efficient way to solve your problem. A better way is to simply #define tstring to either string or wstring, like so:
复制代码
This is better because STL already defines string and wstring, so why use templates to create another string class that's the same as one of these, just to call it tstring? You can use #define to define tstring to string or wstring, which will save you from creating another template class (though compilers are getting so smart these days it wouldn't surprise me if the duplicate class were discarded). ng
In any case, once you have tstring, you can write code like this:
复制代码
The method basic_string::c_str returns a const pointer to the underlying character type; in this case, that character type is either const char* or const wchar_t* .
Figure 2 shows a simple program I wrote that illustrates tstring. It writes "Hello, world" to a file and reports how many bytes were written. I set the project up so it uses Unicode for the Debug build and MBCS for the Release build. You can compile both builds and run them to compare the results. Figure 3 shows a sample run.
Figure 2 tstring
复制代码
Figure 3 tstring in Action
By the way, MFC's CString is now married to ATL so that both MFC and ATL use the same string implementation. The combined implementation uses a template class called CStringT that works like STL's basic_string in the sense that it lets you create a CString class based on any underlying character type. The MFC include file afxstr.h defines three string types, like so:
复制代码
CStringW, CStringA, and CString are just what you would expect: wide, ASCII, and TCHAR versions of CString.
So which is better, STL or CStrings? Both classes are fine, and you should use whichever you like best. One issue to consider is which libraries you want to link with and whether you're already using ATL/MFC or not. From a coding perspective, I prefer CString for two features. First, you can initialize a CString from either wide or char strings:
复制代码
Both initializations work because CString silently performs whatever conversions are necessary. With STL strings, you can't initialize a tstring without using _T() because you can't initialize a wstring from a char* or vice versa. The other feature I like about CString is its automatic conversion operator to LPCTSTR, which lets you write the following:
复制代码
With STL, on the other hand, you have to explicitly call c_str. This is really nit-picking and some would even argue it's better to know when you're performing a conversion. For example, CStrings can get you in trouble with functions that use C-style variable arguments (varargs), such as printf:
复制代码
Without the cast you can get garbage results because printf expects s to be char*. I'm sure many readers have made this error. Preventing this sort of mishap is no doubt one reason the designers of STL chose not to provide a conversion operator, insisting instead that you invoke c_str. In general, the STL folks tend to be a little more academic and purist types, whereas the Redmontonians are a little more practical and loosey-goosey. Hey, whatever. The practical differences between std::string and CString are slim.
In one word, in order to use STL string to be adaptive with Unicode, the best way is to define a tstring as follows:
#include<string> #ifdef _UNICODE #define tstring wstring #else #define tstring string #endif using namespace std;
Then we can use tstring everywhere to store _T("XX") thing. Detail, please look into below Q&A. I am so happy the solution is exactly the same as mine before seeing this article.
Q: I use the Standard Template Library (STL) std::string class very often in my C++ programs, but I have a problem when it comes to Unicode. When using regular C-style strings I can use TCHAR and the _T macro to write code that compiles for either Unicode or ASCII, but I always find it difficult to get this ASCII/Unicode combination working with the STL string class. Do you have any suggestions?
A: Sure. It's easy, once you know how TCHAR and _T work. The basic idea is that TCHAR is either char or wchar_t, depending on the value of _UNICODE:
复制代码
// abridged from tchar.h #ifdef _UNICODE typedef wchar_t TCHAR; #define __T(x) L ## x #else typedef char TCHAR; #define __T(x) x #endif
When you choose Unicode as the character set in your project settings, the compiler compiles with _UNICODE defined. If you select MBCS (Multi-Byte Character Sets), the compiler builds without _UNICODE. Everything hinges on the value of _UNICODE. Similarly, every Windows® API function that uses char pointers has an A (ASCII) and a W (Wide/Unicode) version, with the real version defined to one of these, based on the value of _UNICODE:
复制代码
#ifdef UNICODE #define CreateFile CreateFileW #else #define CreateFile CreateFileA #endif
Likewise, there's _tprintf and _tscanf for printf and scanf. All the 't' versions use TCHARs instead of chars. So how can you apply all this to std::string? Easy. STL already has a wstring class that uses wide characters (defined in the file xstring). Both string and wstring are typedef-ed as template classes using basic_string, which lets you create a string class using any character type. Here's how STL defines string and wstring:
复制代码
// (from include/xstring) typedef basic_string<char, char_traits<char>, allocator<char> > string; typedef basic_string<wchar_t, char_traits<wchar_t>, allocator<wchar_t> > wstring;
The templates are parameterized by the underlying character type (char or wchar_t), so all you need for a TCHAR version is to mimic the definitions using TCHAR:
复制代码
typedef basic_string<TCHAR, char_traits<TCHAR>, allocator<TCHAR> > tstring;
Now you have a tstring that's based on TCHAR—that is, either char or wchar_t, depending on the value of _UNICODE. I'm showing you this to point out how STL uses basic_string to implement strings based on any underlying character type. Defining a new typedef isn't the most efficient way to solve your problem. A better way is to simply #define tstring to either string or wstring, like so:
复制代码
#ifdef _UNICODE #define tstring wstring #else #define tstring string #endif
This is better because STL already defines string and wstring, so why use templates to create another string class that's the same as one of these, just to call it tstring? You can use #define to define tstring to string or wstring, which will save you from creating another template class (though compilers are getting so smart these days it wouldn't surprise me if the duplicate class were discarded). ng
In any case, once you have tstring, you can write code like this:
复制代码
tstring s = _T("Hello, world"); _tprintf(_T("s =%s\n"), s.c_str());
The method basic_string::c_str returns a const pointer to the underlying character type; in this case, that character type is either const char* or const wchar_t* .
Figure 2 shows a simple program I wrote that illustrates tstring. It writes "Hello, world" to a file and reports how many bytes were written. I set the project up so it uses Unicode for the Debug build and MBCS for the Release build. You can compile both builds and run them to compare the results. Figure 3 shows a sample run.
Figure 2 tstring
复制代码
//////////////////////////////////////////////////////////////// // MSDN Magazine — August 2004 // If this code works, it was written by Paul DiLascia. // If not, I don't know who wrote it. // Compiles with Visual Studio .NET 2003 on Windows XP. Tab size=3. // // TSTRING shows how to implement a tstring class that uses STL string or // wstrings depending on the setting of _UNICODE, similar to TCHAR, // _tprintf and all the other "t" versions of functions in the C runtime. // // To see the difference, compile both debug and release versions. The // debug version uses Unicode; the release uses MBCS. Then run each // program and compare the output files. // #include "stdafx.h" #include "resource.h" using namespace std; // tstring is either string or wstring, depending on _UNICODE. // This works too, but may produce an extra class: // // typedef basic_string<TCHAR, char_traits<TCHAR>, // allocator<TCHAR> > tstring; // #ifdef _UNICODE #define tstring wstring #else #define tstring string #endif static void WriteString(HANDLE f, LPCTSTR lpsz, int len); void _tmain(int argc, TCHAR* argv[], TCHAR* envp[]) { // process args if (argc != 2) { _tprintf(_T("Usage: tstring [filename]\n")); _tprintf(_T(" writes test message to [filename]\n")); return; } // CreateFile will create Unicode or MBCS string // depending on value of _UNICODE. LPCTSTR filename = argv[1]; HANDLE f = CreateFile(filename, ...); if (f!=INVALID_HANDLE_VALUE) { if (GetFileType(f) == FILE_TYPE_DISK) { // create STL tstring tstring s = _T("Hello, world"); WriteString(f, s.c_str(), s.length()); } else { tprintf(_T("ERROR: the specified file '%s' is not a disk file\n"), filename); } CloseHandle(f); // close file } else { _tprintf(_T("ERROR: can't open '%s'\n"), filename); } } //////////////// // write string to file. // void WriteString(HANDLE f, LPCTSTR lpsz, int len) { DWORD nWrite = len * sizeof(TCHAR); DWORD nActual; if (WriteFile(f, lpsz, nWrite, &nActual, NULL)) { // display results. _tprintf(_T("%d bytes written\n sizeof(TCHAR)=%d\n"), nActual, sizeof(TCHAR)); } else { _tprintf(_T("ERROR %d writing\n"), GetLastError()); } }
Figure 3 tstring in Action
By the way, MFC's CString is now married to ATL so that both MFC and ATL use the same string implementation. The combined implementation uses a template class called CStringT that works like STL's basic_string in the sense that it lets you create a CString class based on any underlying character type. The MFC include file afxstr.h defines three string types, like so:
复制代码
typedef ATL::CStringT<wchar_t, StrTraitMFC<wchar_t>> CStringW; typedef ATL::CStringT<char, StrTraitMFC<char>> CStringA; typedef ATL::CStringT<TCHAR, StrTraitMFC<TCHAR>> CString;
CStringW, CStringA, and CString are just what you would expect: wide, ASCII, and TCHAR versions of CString.
So which is better, STL or CStrings? Both classes are fine, and you should use whichever you like best. One issue to consider is which libraries you want to link with and whether you're already using ATL/MFC or not. From a coding perspective, I prefer CString for two features. First, you can initialize a CString from either wide or char strings:
复制代码
CString s1 = "foo"; CString s2 = _T("bar");
Both initializations work because CString silently performs whatever conversions are necessary. With STL strings, you can't initialize a tstring without using _T() because you can't initialize a wstring from a char* or vice versa. The other feature I like about CString is its automatic conversion operator to LPCTSTR, which lets you write the following:
复制代码
CString s; LPCTSTR lpsz = s;
With STL, on the other hand, you have to explicitly call c_str. This is really nit-picking and some would even argue it's better to know when you're performing a conversion. For example, CStrings can get you in trouble with functions that use C-style variable arguments (varargs), such as printf:
复制代码
printf("s=%s\n", s); // Error: thinks s is char* printf("s=%s\n", (LPCTSTR)s); // required
Without the cast you can get garbage results because printf expects s to be char*. I'm sure many readers have made this error. Preventing this sort of mishap is no doubt one reason the designers of STL chose not to provide a conversion operator, insisting instead that you invoke c_str. In general, the STL folks tend to be a little more academic and purist types, whereas the Redmontonians are a little more practical and loosey-goosey. Hey, whatever. The practical differences between std::string and CString are slim.
相关文章推荐
- STL的string如何使用UNICODE?
- [转]SSIS cannot convert between unicode and non-unicode string
- ASCII/Unicode的结合与 STL 的 string 类的方法
- 【CodeForces】501B - Misha and Changing Handles(STL - string & vector & pair)
- (转)标准模板库(STL)的 std::string 与Unicode的使用
- std::string and stl 算法
- std::string and stl 算法
- 你怎么就一根筋呢? STL之vector and string用还是不用
- ANSI and UnicodeCharacter and String Data Types
- 标准模板库(STL)的 std::string 与Unicode的使用
- [转] std::string and stl 算法
- 在Python中使用protobuf2.6.1 string format utf-8 and unicode error
- Excel Destination Error: Column"xx" cannot convert between unicode and non-unicode string data types
- 标准模板库(STL)的 std::string 与Unicode的使用
- KdPrint/DbgPrint and UNICODE_STRING/ANSI_STRING
- STL string与CString的Unicode字符集
- KdPrint/DbgPrint and UNICODE_STRING/ANSI_STRING
- ASCII/Unicode的结合与 STL 的 string 类的方法
- KdPrint/DbgPrint and UNICODE_STRING/ANSI_STRING
- STL中的std::string与Unicode的使用