您的位置:首页 > 其它

STL String and Unicode

2009-02-22 19:56 288 查看
Get it from: http://msdn.microsoft.com/zh-cn/magazine/cc188714(en-us).aspx

In one word, in order to use STL string to be adaptive with Unicode, the best way is to define a tstring as follows:

#include<string>

#ifdef _UNICODE

#define tstring wstring

#else

#define tstring string

#endif

using namespace std;


Then we can use tstring everywhere to store _T("XX") thing. Detail, please look into below Q&A. I am so happy the solution is exactly the same as mine before seeing this article.

Q: I use the Standard Template Library (STL) std::string class very often in my C++ programs, but I have a problem when it comes to Unicode. When using regular C-style strings I can use TCHAR and the _T macro to write code that compiles for either Unicode or ASCII, but I always find it difficult to get this ASCII/Unicode combination working with the STL string class. Do you have any suggestions?

A: Sure. It's easy, once you know how TCHAR and _T work. The basic idea is that TCHAR is either char or wchar_t, depending on the value of _UNICODE:



复制代码


// abridged from tchar.h

#ifdef  _UNICODE

typedef wchar_t TCHAR;

#define __T(x) L ## x

#else

typedef char TCHAR;

#define __T(x) x

#endif


When you choose Unicode as the character set in your project settings, the compiler compiles with _UNICODE defined. If you select MBCS (Multi-Byte Character Sets), the compiler builds without _UNICODE. Everything hinges on the value of _UNICODE. Similarly, every Windows® API function that uses char pointers has an A (ASCII) and a W (Wide/Unicode) version, with the real version defined to one of these, based on the value of _UNICODE:



复制代码


#ifdef UNICODE

#define CreateFile CreateFileW

#else

#define CreateFile CreateFileA

#endif


Likewise, there's _tprintf and _tscanf for printf and scanf. All the 't' versions use TCHARs instead of chars. So how can you apply all this to std::string? Easy. STL already has a wstring class that uses wide characters (defined in the file xstring). Both string and wstring are typedef-ed as template classes using basic_string, which lets you create a string class using any character type. Here's how STL defines string and wstring:



复制代码


// (from include/xstring)

typedef basic_string<char,

char_traits<char>, allocator<char> >

string;

typedef basic_string<wchar_t,

char_traits<wchar_t>, allocator<wchar_t> >

wstring;


The templates are parameterized by the underlying character type (char or wchar_t), so all you need for a TCHAR version is to mimic the definitions using TCHAR:



复制代码


typedef basic_string<TCHAR,

char_traits<TCHAR>,

allocator<TCHAR> >

tstring;


Now you have a tstring that's based on TCHAR—that is, either char or wchar_t, depending on the value of _UNICODE. I'm showing you this to point out how STL uses basic_string to implement strings based on any underlying character type. Defining a new typedef isn't the most efficient way to solve your problem. A better way is to simply #define tstring to either string or wstring, like so:



复制代码


#ifdef _UNICODE

#define tstring wstring

#else

#define tstring string

#endif


This is better because STL already defines string and wstring, so why use templates to create another string class that's the same as one of these, just to call it tstring? You can use #define to define tstring to string or wstring, which will save you from creating another template class (though compilers are getting so smart these days it wouldn't surprise me if the duplicate class were discarded). ng
In any case, once you have tstring, you can write code like this:



复制代码


tstring s = _T("Hello, world");

_tprintf(_T("s =%s\n"), s.c_str());


The method basic_string::c_str returns a const pointer to the underlying character type; in this case, that character type is either const char* or const wchar_t* .
Figure 2 shows a simple program I wrote that illustrates tstring. It writes "Hello, world" to a file and reports how many bytes were written. I set the project up so it uses Unicode for the Debug build and MBCS for the Release build. You can compile both builds and run them to compare the results. Figure 3 shows a sample run.


Figure 2 tstring



复制代码


////////////////////////////////////////////////////////////////
// MSDN Magazine — August 2004
// If this code works, it was written by Paul DiLascia.
// If not, I don't know who wrote it.
// Compiles with Visual Studio .NET 2003 on Windows XP. Tab size=3.
//
// TSTRING shows how to implement a tstring class that uses STL string or
// wstrings depending on the setting of _UNICODE, similar to TCHAR,
// _tprintf and all the other "t" versions of functions in the C runtime.
//
// To see the difference, compile both debug and release versions. The
// debug version uses Unicode; the release uses MBCS. Then run each
// program and compare the output files.
//
#include "stdafx.h"
#include "resource.h"
using namespace std;
// tstring is either string or wstring, depending on _UNICODE.
// This works too, but may produce an extra class:
//
//   typedef basic_string<TCHAR, char_traits<TCHAR>,
//      allocator<TCHAR> > tstring;
//
#ifdef _UNICODE
#define tstring wstring
#else
#define tstring string
#endif
static void WriteString(HANDLE f, LPCTSTR lpsz, int len);
void _tmain(int argc, TCHAR* argv[], TCHAR* envp[])
{
// process args
if (argc != 2) {
_tprintf(_T("Usage: tstring [filename]\n"));
_tprintf(_T("       writes test message to [filename]\n"));
return;
}
// CreateFile will create Unicode or MBCS string
// depending on value of _UNICODE.
LPCTSTR filename = argv[1];
HANDLE f = CreateFile(filename, ...);
if (f!=INVALID_HANDLE_VALUE)
{
if (GetFileType(f) == FILE_TYPE_DISK)
{
// create STL tstring
tstring s = _T("Hello, world");
WriteString(f, s.c_str(), s.length());
} else {
tprintf(_T("ERROR: the specified file '%s' is not a disk file\n"),
filename);
}
CloseHandle(f); // close file
} else {
_tprintf(_T("ERROR: can't open '%s'\n"), filename);
}
}
////////////////
// write string to file.
//
void WriteString(HANDLE f, LPCTSTR lpsz, int len)
{
DWORD nWrite = len * sizeof(TCHAR);
DWORD nActual;
if (WriteFile(f, lpsz, nWrite, &nActual, NULL)) {
// display results.
_tprintf(_T("%d bytes written\n sizeof(TCHAR)=%d\n"), nActual,
sizeof(TCHAR));
} else {
_tprintf(_T("ERROR %d writing\n"), GetLastError());
}
}




Figure 3 tstring in Action

By the way, MFC's CString is now married to ATL so that both MFC and ATL use the same string implementation. The combined implementation uses a template class called CStringT that works like STL's basic_string in the sense that it lets you create a CString class based on any underlying character type. The MFC include file afxstr.h defines three string types, like so:



复制代码


typedef ATL::CStringT<wchar_t,

StrTraitMFC<wchar_t>> CStringW;

typedef ATL::CStringT<char,

StrTraitMFC<char>> CStringA;

typedef ATL::CStringT<TCHAR,

StrTraitMFC<TCHAR>> CString;


CStringW, CStringA, and CString are just what you would expect: wide, ASCII, and TCHAR versions of CString.
So which is better, STL or CStrings? Both classes are fine, and you should use whichever you like best. One issue to consider is which libraries you want to link with and whether you're already using ATL/MFC or not. From a coding perspective, I prefer CString for two features. First, you can initialize a CString from either wide or char strings:



复制代码


CString s1 = "foo";

CString s2 = _T("bar");


Both initializations work because CString silently performs whatever conversions are necessary. With STL strings, you can't initialize a tstring without using _T() because you can't initialize a wstring from a char* or vice versa. The other feature I like about CString is its automatic conversion operator to LPCTSTR, which lets you write the following:



复制代码


CString s;

LPCTSTR lpsz = s;


With STL, on the other hand, you have to explicitly call c_str. This is really nit-picking and some would even argue it's better to know when you're performing a conversion. For example, CStrings can get you in trouble with functions that use C-style variable arguments (varargs), such as printf:



复制代码


printf("s=%s\n", s); // Error: thinks s is char*

printf("s=%s\n", (LPCTSTR)s); // required


Without the cast you can get garbage results because printf expects s to be char*. I'm sure many readers have made this error. Preventing this sort of mishap is no doubt one reason the designers of STL chose not to provide a conversion operator, insisting instead that you invoke c_str. In general, the STL folks tend to be a little more academic and purist types, whereas the Redmontonians are a little more practical and loosey-goosey. Hey, whatever. The practical differences between std::string and CString are slim.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: