确定Windows XP到底是UCS-2的还是UTF-16的
2009-03-01 14:35
381 查看
确定Windows XP到底是UCS-2的还是UTF-16的
write by 九天雁翎(JTianLing) -- blog.csdn.net/vagrxie讨论新闻组及文件
一般认为Windows下以16bit表示的Unicode并不是UTF-16,而是UCS-2。UCS-2是一种编码格式,同时也是指以一一对应关系的Unicode实现。在UCS-2中只能表示U+0000到U+FFFF的BMP(Basic Multilingual Plane ) Unicode编码范围,属于定长的Unicode实现,而UTF-16是变长的,类似于UTF-8的实现,但是由于其字节长度的增加,所以BMP部分也做到了一一对应,但是其通过两个双字节的组合可以做到表示全部Unicode,表示范围从U+0000 到 U+10FFFF。关于这一点,我在很多地方都看到混淆了,混的我自己都有点不太肯定自己的说法了,还好在《UTF-16/UCS-2》中还是区别开了,不然我不知道从哪里去寻找一个正确答案。(哪怕在IBM的相关网页上都将UCS-2作为UTF-16的别名列出)
在《UTF-16/UCS-2》文中有以下内容:
UTF-16 is the native internal representation of text in the Microsoft Windows 2000/XP/2003/Vista/CE; Qualcomm BREW operating systems; the Java and .NET bytecode environments; Mac OS X's Cocoa and Core Foundation frameworks; and the Qt cross-platform graphical widget toolkit.[1][2][citation needed]
Symbian OS used in Nokia S60 handsets and Sony Ericsson UIQ handsets uses UCS-2.
The Joliet file system, used in CD-ROM media, encodes filenames using UCS-2BE (up to 64 Unicode characters per file).
Older Windows NT systems (prior to Windows 2000) only support UCS-2.[3]. In Windows XP, no code point above U+FFFF is included in any font delivered with Windows for European languages, possibly with Chinese Windows versions.[clarification needed]
很明确的说明了Windows 2000以后内核已经是UTF-16的了,这点还真是与平时的感觉相违背,于是可以测试一下。在UTF-16的编码转换函数(Python实现)
中我在windows下输出了三个太玄经的字符,“[b]
相关文章推荐
- 确定Windows XP到底是UCS-2的还是UTF-16的
- JavaScript 的内部字符编码是 UCS-2 还是 UTF-16
- 彻底搞清楚字符编码: ASCII, ISO_8859, GB2312,UCS, Unicode, UTF8.(GBK, GB18030, BIG5, UTF-7,UTF-16,UTF-32) .
- C++11与Unicode及使用标准库进行UTF-8、UTF-16、UCS2、UCS4/UTF-32编码转换
- 到底是:UTF-8 还是:UTF8?
- 字符编码:ASCII、Unicode、UTF-8、UTF-16、UCS、BOM、Endian
- 编程基础-java编码方式-UCS/UTF-16
- UCS UTF UTF-7 UTF-8 UTF-16
- 彻底搞清楚字符编码: ASCII, ISO_8859, GB2312,UCS, Unicode, UTF8.(GBK, GB18030, BIG5, UTF-7,UTF-16,UTF-32)
- 字符编码笔记:ASCII、Unicode、UTF-8、UTF-16、UCS、BOM、Endian
- C++11与Unicode及使用标准库进行UTF-8、UTF-16、UCS2、UCS4/UTF-32编码转换
- UCS unicode UTF-8 UTF-16 UTF-32
- 字符编码:Unicode/UTF-8/UTF-16/UCS/Endian/BMP/BOM
- 细说:Unicode, UTF-8, UTF-16, UTF-32, UCS-2, UCS-4
- GB18000的UCS-2编码即是java中的utf-16BL编码
- unicode、ucs-2、ucs-4、utf-16、utf-32、utf-8
- unicode、ucs-2、ucs-4、utf-16、utf-32、utf-8介绍
- VS下的wchar_t类型编码类型是UTF-8还是UTF-16?
- php读取淘宝数据包csv文件 unicode ucs-2 utf-16 中文乱码问题解决
- 字符编码笔记:ASCII、Unicode、UTF-8、UTF-16、UCS、BOM、Endian