您的位置:首页 > 其它

sun.jnu.encoding

2015-07-20 21:44 465 查看

Java Platform Encoding

This came up at
$WORK
recently. We had a java program that was given input through command line arguments. Unfortunately, it went wrong when being passed UTF-8 characters (U+00A9 COPYRIGHT SIGN [©]). Printing out the command line arguments from
inside Java showed that we had double encoded Unicode.

Initially, we just slapped
-Dfile.encoding=UTF-8
on the command line. But that failed when the site that called this code went through an automatic restart. So we investigated the issue further.

We quickly found that the presence of absence of the
LANG
environment variable had a bearing on the matter.

NB:
ShowSystemProperties.jar
is very simple and just lists all system properties in sorted order.

$ java -version
java version "1.6.0_16"
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) Server VM (build 14.2-b01, mixed mode)
$ echo $LANG
en_GB.UTF-8
$ java -jar ShowSystemProperties.jar | grep encoding
file.encoding=UTF-8
file.encoding.pkg=sun.io
sun.io.unicode.encoding=UnicodeLittle
sun.jnu.encoding=UTF-8
$ LANG= java -jar ShowSystemProperties.jar | grep encoding
file.encoding=ANSI_X3.4-1968
file.encoding.pkg=sun.io
sun.io.unicode.encoding=UnicodeLittle
sun.jnu.encoding=ANSI_X3.4-1968

So, setting
file.encoding
works, but there’s an internal property,
sun.jnu.encoding
as well.

Next, see what happens when we add the explicit override.

$ LANG= java -Dfile.encoding=UTF-8 -jar ShowSystemProperties.jar | grep encoding
file.encoding=UTF-8
file.encoding.pkg=sun.io
sun.io.unicode.encoding=UnicodeLittle
sun.jnu.encoding=ANSI_X3.4-1968

Hey!
sun.jnu.encoding
isn’t changing!

Now, as far as I can see, sun.jnu.encoding isn’t actually documented anywhere. So you have to go into the source code for Java (openjdk’s

jdk6-b16 in this case) to figure out what’s up.

Let’s start in
main()
, which is in
java.c. Actually, it’s
JavaMain()
that we’re really interested in. In there you can see:

int JNICALL
JavaMain(void * _args)
{
…
jobjectArray mainArgs;

…
/* Build argument array */
mainArgs = NewPlatformStringArray(env, argv, argc);
if (mainArgs == NULL) {
ReportExceptionDescription(env);
goto leave;
}
…
}

NewPlatformStringArray()
is defined in
java.c
and calls
NewPlatformString()
repeatedly with each command line argument. In turn, that calls
new String(byte[], encoding)
. It gets the encoding from
getPlatformEncoding()
. That essentially calls
System.getProperty("sun.jnu.encoding")
.

So where does that property get set? If you look in
System.c
,
Java_java_lang_System_initProperties()
calls:

PUTPROP(props, "sun.jnu.encoding", sprops->sun_jnu_encoding);

sprops appears to get set in
GetJavaProperties()
in
java_props_md.c. This interprets various environment variables including the one that control the locale. It appears to pull out everything after the period in the
LANG
environment variable as the encoding in order to get
sun_jnu_encoding
.

Phew. So we now know that there is a special property which gets used for interpreting “platform” strings like:

* Command line arguments

* Main class name

* Environment variables

And it can be overridden:

$ LANG= java -Dsun.jnu.encoding=UTF-8 -Dfile.encoding=UTF-8 -jar ShowSystemProperties.jar | grep encoding
file.encoding=UTF-8
file.encoding.pkg=sun.io
sun.io.unicode.encoding=UnicodeLittle
sun.jnu.encoding=UTF-8

This entry was posted in
Uncategorized and tagged
java,
unicode on
September 24, 2009.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: