sun.jnu.encoding
2015-07-20 21:44
465 查看
Java Platform Encoding
This came up at$WORKrecently. We had a java program that was given input through command line arguments. Unfortunately, it went wrong when being passed UTF-8 characters (U+00A9 COPYRIGHT SIGN [©]). Printing out the command line arguments from
inside Java showed that we had double encoded Unicode.
Initially, we just slapped
-Dfile.encoding=UTF-8on the command line. But that failed when the site that called this code went through an automatic restart. So we investigated the issue further.
We quickly found that the presence of absence of the
LANGenvironment variable had a bearing on the matter.
NB:
ShowSystemProperties.jaris very simple and just lists all system properties in sorted order.
$ java -version java version "1.6.0_16" Java(TM) SE Runtime Environment (build 1.6.0_16-b01) Java HotSpot(TM) Server VM (build 14.2-b01, mixed mode) $ echo $LANG en_GB.UTF-8 $ java -jar ShowSystemProperties.jar | grep encoding file.encoding=UTF-8 file.encoding.pkg=sun.io sun.io.unicode.encoding=UnicodeLittle sun.jnu.encoding=UTF-8 $ LANG= java -jar ShowSystemProperties.jar | grep encoding file.encoding=ANSI_X3.4-1968 file.encoding.pkg=sun.io sun.io.unicode.encoding=UnicodeLittle sun.jnu.encoding=ANSI_X3.4-1968 |
file.encodingworks, but there’s an internal property,
sun.jnu.encodingas well.
Next, see what happens when we add the explicit override.
$ LANG= java -Dfile.encoding=UTF-8 -jar ShowSystemProperties.jar | grep encoding file.encoding=UTF-8 file.encoding.pkg=sun.io sun.io.unicode.encoding=UnicodeLittle sun.jnu.encoding=ANSI_X3.4-1968 |
sun.jnu.encodingisn’t changing!
Now, as far as I can see, sun.jnu.encoding isn’t actually documented anywhere. So you have to go into the source code for Java (openjdk’s
jdk6-b16 in this case) to figure out what’s up.
Let’s start in
main(), which is in
java.c. Actually, it’s
JavaMain()that we’re really interested in. In there you can see:
int JNICALL JavaMain(void * _args) { … jobjectArray mainArgs; … /* Build argument array */ mainArgs = NewPlatformStringArray(env, argv, argc); if (mainArgs == NULL) { ReportExceptionDescription(env); goto leave; } … } |
NewPlatformStringArray()is defined in
java.cand calls
NewPlatformString()repeatedly with each command line argument. In turn, that calls
new String(byte[], encoding). It gets the encoding from
getPlatformEncoding(). That essentially calls
System.getProperty("sun.jnu.encoding").
So where does that property get set? If you look in
System.c,
Java_java_lang_System_initProperties()calls:
PUTPROP(props, "sun.jnu.encoding", sprops->sun_jnu_encoding); |
GetJavaProperties()in
java_props_md.c. This interprets various environment variables including the one that control the locale. It appears to pull out everything after the period in the
LANGenvironment variable as the encoding in order to get
sun_jnu_encoding.
Phew. So we now know that there is a special property which gets used for interpreting “platform” strings like:
* Command line arguments
* Main class name
* Environment variables
And it can be overridden:
$ LANG= java -Dsun.jnu.encoding=UTF-8 -Dfile.encoding=UTF-8 -jar ShowSystemProperties.jar | grep encoding file.encoding=UTF-8 file.encoding.pkg=sun.io sun.io.unicode.encoding=UnicodeLittle sun.jnu.encoding=UTF-8 |
Uncategorized and tagged
java,
unicode on
September 24, 2009.
相关文章推荐
- hibernate学习之路——hello word
- Oracle的触发器
- hdoj1090A+B for Input-Output Practice (II)
- 随笔 2015 7 20
- JSP/SERVLET入门教程--Servlet 使用入门
- JS检查浏览器类型和版本
- JS检查浏览器类型和版本
- JS检查浏览器类型和版本
- 2、实现不同子网之间的信息交流(互相可以PING通)
- 3D打印
- wyh2000 and pupil
- 【JAVA基础】父类类型做形参
- JAVA基础之继承
- Java多线程使用Synchronized需注意锁的永远是对象
- 根据切割下来的字符串断裂成不同的部分
- Sql省市三级联动一张表
- 理解HTTP幂等性(转)
- [数学] AOJ 0009 素数筛选 Prime Number
- PHP-command-1
- Android开发之JSON使用