Tips on internationalization of software
2007-08-16 14:15
429 查看
Introduction
Internationalization or globalization of software is the process of developing a program core whose feature design and code design don't make assumptions based on a single language or locale and whose source code base simplifies the creation of different language editions of a program. The aim of this article is to provide solution for internationalization of existing software. The solution uses Windows 2000 as the platform, Visual Studio and Visual C++ for implementing the solution.Keywords
Character set: Each written language uses a specific set of symbols. Character set is an encoding of one or more symbol sets into numbers that can be manipulated by computer hardware and software. Examples of character set are EBCDIC, ASCII.Code page: An ordered set of characters in which a numeric index (code point value) is associated with each character. Example of code pages is Code Page 437 or CP437
SBCS: SBCS is 8 - bit character encoding, which is sufficient to represent ASCII character set.
MBCS: MBCS also called double byte character (DBCS). MBCS is a variable bit character encoding. Each character in MBCS is either one byte or two bytes. In character set some range of byte is set aside as lead byte. A lead byte specifies that it and following trail byte comprise a single two byte wide character.
Unicode: Unicode is 16 bit character encoding.
Locale: A locale is a set of rules-and-data specific to a given language and geographic area. These rules and data include information on:
Character classification,
Date and time formatting,
Numeric, currency, weight and measure conventions, and
Sorting rules.
Overview
Change the project setting so that the entry point of the program is changed andUnicodeis defined. Collect string literals from the source code and add the strings that need to be localized to the string table in the resource of the project. The strings that do not need to be localized are handled in different way. Use macros defined in
tchar.hfile to handle string variables. Replace functions like
strcmpby
_tcscmp. The functions starting with
_tcsare macros, which gets mapped to the functions starting with
strif
_UNICODEand
_MBCSis not defined or gets mapped to functions starting with
_mbsif
_MBCSis defined or gets mapped to functions starting with
wcsif
_UNICODEis defined. To localize user interface make a copy of the resource change the English text to the localized text by using cut and paste mechanism. Then localized date, time and number formats. Short cut keys are also to be localized because of changing keyboard patterns for different locale.
Changing the project setting
SetwWinMainCRTStartupas the Entry Point symbol in the Output category of the Link tab in the Project Settings dialog box. This will make
wWinMainas entry point as MFC Unicode applications use
wWinMainas the entry point.
Set
_UNICODEand
UNICODEas the preprocessor definitions in the General category of the C/C++ tab in the Project Settings dialog box. This will define
UNICODEand
_UNICODEin the project
Handling string literals
The Visual C++ compiler interprets a literal string coded asL"this is a literal string"to mean a string of Unicode characters. Use the
_Tmacro to code literal strings generically, so they compile as Unicode strings under Unicode or as ANSI strings (including MBCS) without Unicode. For example, instead of:
pWnd->SetWindowText( “Hello” );
use:
pWnd->SetWindowText( _T(“Hello”) );
With
_UNICODEdefined,
_Ttranslates the literal string to the L-prefixed form; otherwise,
_Ttranslates the string without the L prefix.
String and number literals that need to be localized are stored in
STRINGTABLEresource by adding them to the applications
.rcfile. For each locale a copy of
STRINGTABLEis made and the caption of the strings is changed to contain the localized or translated string. Then using LoadString function the string literal is loaded into memory. The value of string loaded depends on the thread locale.
Handling string variables
Use macros defined intchar.hthat will expand depending upon
_UNICODEand
_MBCSsymbols. Following is the list of macros and their expansion when
_UNICODEand
_MBCSis defined.
Macro | Meaning | Expansion when _MBCS is defined | Expansion when_UNICODE defined |
_TCHAR | Character | Char | wchar_t |
_TSCHAR | Signed character | signed char | wchar_t |
_TUCHAR | Unsigned character | unsigned char | wchar_t |
_TXCHAR | Unsigned character | unsigned char | wchar_t |
_TINT | Integer | unsigned int | wint_t |
_TEOF | End of file | EOF | WEOF |
TCHAR.Hdefines macros prefixed with
_tcs, which, with the correct preprocessor definitions, map to
str, _mbs, or wcsfunctions as appropriate.
To check upper or lower case character use
IsCharLowerand
IsCharUpper.
To convert from lower case to upper case use
CharUpperand
CharUpperBuff.
To convert from upper case to lower case use
CharLowerand
CharLowerBuff.
For string comparison use
CompareString.
For string conversion use
MultiByteToWideChar, WideCharToMultiByte, LCMapString and FoldString.
Localization of user interface resource
Edit.rcfile of the project by adding a copy of a resource for particular language and mention the proper code page number. Following is the extract from a resource file for English resource.
///////////////////////////////////////////////////////////////////////////// // English (U.S.) resources #if !defined(AFX_RESOURCE_DLL) || defined(AFX_TARG_ENU) #ifdef _WIN32 LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US #pragma code_page(1252) #endif //_WIN32
Each resource is placed in a different language section of the resource database, using
LANGUAGEkeyword. The
pragmastatements specify the code pages that the resource compiler should use to convert the string into
Unicode. This code page is used by the run time system to display the characters.
Using cut and paste mechanism change the title of the window and controls in the resource editor.
Localization of date
TheGetDateFormatfunction formats a date as a date string for a specified locale. The function formats either a specified date or the local system date.
int GetDateFormat( LCID Locale, // locale for which date is to be formatted DWORD dwFlags, // flags specifying function options CONST SYSTEMTIME *lpDate, // date to be formatted LPCTSTR lpFormat, // date format string LPTSTR lpDateStr, // buffer for storing formatted string int cchDate // size of buffer );
The first argument specifies the locale for which the date string is to be formatted. If
lpFormatis
NULL, the function formats the string according to the date format for this locale. If
lpFormatis not
NULL, the function uses the locale only for information not specified in the format picture string
GetDateFormataccepts date information as a
SYSTEMTIMEstructure whose pointer is passed as the third argument. If
NULLis passed as third argument the function use the current system date.
The second argument specifies the formatting style using symbols defined in
winnls.h. These are
DATE_SHORTDATE, DATE_LONGDATE and DATE_YEARMONTH.Sometimes you need to deviate from the default date formats. In those cases, you must either fetch an alternative format string from the locale database or construct a format string and pass it as fourth parameter. This format string is often called a picture string. It can contain the codes listed below:
Picture code | Description |
d | Day of month as digits, with no leading zero for single digit days |
dd | Day of month as digits with leading zero for single digit days. |
ddd | Day of week as three letter abbreviation |
dddd | Day of week as full name |
M | Month as digits with no leading zero for single digit months. |
MM | Month as digits with leading zero for single digit months |
MMM | Month as three letter abbreviation |
MMMM | Month as full name |
y | Year as last two digits with no leading zero for years less than 10 |
yy | Year as last two digits with leading zero for years less than 10 |
yyyy | Year as four digits |
gg | Era or period string |
Localization of time
Win32 provides two functions for working with time formats:GetTimeFormatand
EnumTimeFormats.The
GetTimeFormatfunction formats a time as a time string for a specified locale. The function formats either a specified time or the local system time.
intGetTimeFormat( LCID Locale, // locale for which time is to be formatted DWORD dwFlags, // flags specifying function options CONST SYSTEMTIME *lpTime, // time to be formatted LPCTSTR lpFormat, // time format string LPTSTR lpTimeStr, // buffer for storing formatted string int cchTime // size, in bytes or characters, of the buffer );
The first parameter
LocaleSpecifies the locale for which the time string is to be formatted. If
lpFormatis
NULL, the function formats the string according to the time format for this locale. If
lpFormatis not
NULL, the function uses the locale only for information not specified in the format picture string.
The fourth argument of
GetTimeFormatis an optional picture string using the codes shown below:
Picture code | Description |
h | Hours with no leading zero for single digit hours; 12 hour clock |
hh | Hours with leading zero for single digit hours; 12 hour clock |
H | Hours with no leading zero for single digit hours; 24 hour clock |
HH | Hours with leading zero for single digit hours; 24 hour clock |
m | Minutes with no leading zero for single digit minutes |
mm | Minutes with leading zero for single digit minutes |
s | Seconds with no leading zero for single digit seconds |
ss | Seconds with leading zero for single digit seconds |
t | Single character time marker string such as A or P |
tt | Multi character time marker string such as AM or PM |
LOCALE_STIMEFORMATitem from specified locale.
GetTimeFormatalso enables you to omit or adjust parts of the formatted output by specifying the appropriate flags in its second argument as shown below:
Style | Description |
TIME_NOMINUTESORSECONDS | No minutes or seconds |
TIME_NOSECONDS | No seconds |
TIME_NOTIMEMARKER | No AM/PM marker |
TIME_FORCE24HOURFORMAT | 24-hour time format |
SYSTEMTIMEstructure that contains the time information to be formatted. If this pointer is
NULL, the function uses the current local system time
Localization of number format
TheGetNumberFormatfunction formats a number string as a number string customized for a specified locale.
int GetNumberFormat( LCID Locale, // locale for which string is to be formatted DWORD dwFlags, // bit flag that controls the function's operation LPCTSTR lpValue, // pointer to input number string CONST NUMBERFMT *lpFormat, // pointer to a formatting information structure LPTSTR lpNumberStr, // pointer to output buffer int cchNumber // size of output buffer );
The
GetNumberFormatfunction punctuates a numeric string according to the rules of a specific locale. The first argument specifies the locale for which the number string is to be formatted. If
lpFormatis
NULL, the function formats the string according to the number format for this locale.
The second argument contains a bit flag that controls the operation of the function. If
lpFormatis
non-NULL, this parameter must be zero. If
lpFormatis
NULL, you can specify the
LOCALE_NOUSEROVERRIDEflag to format the string using the system default number format for the specified locale; or you can specify zero to format the string using any user overrides to the locale's default number format.
The third argument is a pointer to input string. The input string should contain digit characters from 0 through 9 with a leading minus sign if the number is negative and a single decimal point if the number has a fractional part. The presence of any other characters causes the function to return 0, in which case you can call
GetLastErrorto get more details. If the function succeeds, it returns the number of characters produced in the output buffer.
GetNumberFormatreplaces the minus sign and decimal point with characters specified by the locale database. It also rounds the fractional part to the appropriate number of places and inserts grouping characters in the integer part.
If you want to use a format other than the locales default you must construct a
NUMBERFMTstructure which has following definition:
typedef struct _numberfmt { UINT NumDigits; //From LOCALE_IDIGITS UINT LeadingZero; // From LOCALE_ILZERO UINT Grouping; //From LOCALE_SGROUPING LPTSTR lpDecimalSep; //From LOCALE_SDECIMAL LPTSTR lpThousandSep; //From LOCALE_STHOUSAND UINT NegativeOrder; //From LOCALE_INEGNUMBER }NUMBERFMT;
The structure members correspond directly to the locale items mentioned in the comments, except for the grouping member, which should be a value from 0 through 9 if all groups are the same size.
Handling shortcut keys
Shortcut keys also called accelerators provide keyboard alternatives for menu and toolbar actions. For instance, Ctrl + A and Ctrl + Z are the conventional shortcuts for Select All and Undo commands, respectively. For instance, if you compare the U.S. and French keyboards you will see that the A and Z keys switch places with Q and W.The solution for handling shortcuts keys is as follows:
During initialization, call
GetUserDefaultUILanguageto obtain the default UI language for the current user. Still during initialization call
LoadKeyboardLayoutto get the HKL for the default input locale.
Within the message loop, check whether the current message is from the keyboard. If so, get the current input locale. If the input locale is using a different language than the default UI call
MapVirtualKeyExto map the virtual key code back to the default input locale.
Call
TransalateAcceleratorin the normal way. If it returns
FALSE, restore the original virtual key code in the message so that normal keys are processed in the current language.
MFC and similar Windows programming environments usually bury the main message loop deep within the library, but they also provide “hook” so that you can perform special operation, such as UI sensitive shortcut mapping. MFC offers two virtual functions named PreCreateWindow and PreTranslateMessage for this purpose. Use PreCreateWindow function to contain step 1 and PreTranslateMessage to contain step 2 and 3.
Handling message resource
FormatMessageAPI function converts error codes into localized messages for display to the user. The win32 messages are stored in multilingual resources attached to the various system components or in system resource DLLs. Use message compiler to define a repertoire of message codes and their localized strings. This compiler is a Visual Studio utility program. MC accepts a message script file and produces header file, binary file and single resource script. The header file contains the
#definesymbols for the message codes. The BIN files contain the tables that correlate message codes to text strings for the various languages.
The
FormatMessagefunction formats a message string. The function requires a message definition as input. The message definition can come from a buffer passed into the function. It can come from a message table resource in an already-loaded module. Or the caller can ask the function to search the system's message table resource(s) for the message definition. The function finds the message definition in a message table resource based on a message identifier and a language identifier. The function copies the formatted message text to an output buffer, processing any embedded insert sequences if requested.
DWORD FormatMessage( DWORD dwFlags, // source and processing options LPCVOID lpSource, // pointer to message source DWORD dwMessageId, // requested message identifier DWORD dwLanguageId, // language identifier for requested message LPTSTR lpBuffer, // pointer to message buffer DWORD nSize, // maximum size of message buffer va_list *Arguments // pointer to array of message inserts );
Working of software
The locale settings of the operating system can be change from control panel. Open control panel click on Regional options open General tab and then click on Set default button on General tab. select the system locale. Click on Apply button and restart the computer. If necessary change the input locale. To change the input locale click on Input locales tab. Select one of the installed input locale and the click on Set as default button. Click on Apply button.The dialog and string table resource in different languages are embedded in the image of the executable. When the software is started depending upon the locale settings of the operating system the operating system loads the particular resource and displays the resource. The number formatting, date and time formatting also depends upon the locale settings of the thread, which can be changed at run time.
Pros
The localized strings reside in the image of the executable so there is no need to maintain an external database.Operating system is responsible for loading the language specific resource
Cons
Cut and paste mechanism is used to changed to text of windows and controls this may introduce errors. Also, the programmer who will do cut and paste may not have the knowledge of the language, so he/she will not able to correct the errors.When new language needs to introduce, there is a massive change in the source code. The programmer will have to change the title of the windows and controls and make changes in the string table.
The choice of the language is done through the locale setting of the operating system. There is no way to change the language setting during runtime.
Unable to localize common dialogs like File dialog, color dialog, find replace dialog, font dialog and print dialog. These dialogs are part of the Windows common dialog library (COMMDLG.DLL). The solution to this problem is to develop our own dialogs which shall support Unicode and which are able to localize.
Milind Shingade
Click here to view Milind Shingade's online profile. |
相关文章推荐
- 转贴: The role of a hands-on software architect
- Eric Sink on the Business of Software
- Tips to Survive and Progress in the Field of Software Testing
- Software Test Estimation - 9 General Tips on How to Estimate Testing Time Accurately
- On The Value Of Fundamentals In Software Development (基础知识在软件开发中的价值)
- 论software is not configured&software on system is software_outofdemand
- [翻译Joel On Software]无痛功能需求 –第四部分:技巧/Painless Functional Specifications - Part 4: Tips
- Effective Tips on How to Get Rid of Your Acne Scars Using Acne Scar Home Remedies
- Go on talk about the process of developing software.
- tips of developing ruby on rails projects on ubuntu
- Blitz: a collection of software designed to support a university-level course on Operating Systems
- PatentTips - Method and system for browsing things of internet of things on ip using web platform
- Get to the Top on Google: Tips and Techniques to Get Your Site to the Top of the Search Engine Ranki
- 手机游戏软件开发的前景 The future of development of game software on mobiles
- A list of books on general-purpose algorithms for the practical programmer (or 'software engineer')
- 10 Tips to Survive and Progress in the Field of Software Testing
- 5 Tips on Getting Rid of Acne
- Tips on how to Fix People Problems Appearing in the Business of Mining Machinery?
- Microsoft Software Development Tips
- -fembed-bitcode is not supported on versions of iOS prior to 6.0