您的位置：首页 > 编程语言 > Java开发

java笔记-如何判断字符串的编码？

2018-02-23 14:26 316 查看

由于字符串的编码存在着多种可能，如果没有知名其编码格式，那么就只能一个个去测试，代码如下：

String str = "测试字符串";
String encode = "UTF-8";
try {
if (str.equals(new String(str.getBytes(encode), encode))) {
System.out.println(encode);
}
} catch (final UnsupportedEncodingException e) {
e.printStackTrace();
}
encode = "ISO-8859-1";
try {
if (str.equals(new String(str.getBytes(encode), encode))) {
System.out.println(encode);
}
} catch (final UnsupportedEncodingException e) {
e.printStackTrace();
}
...

补充

str.getBytes(encode)

这个方法是用encode这个参数对str这个字符串进行编码，其官方的注释如下：

Encodes this String into a sequence of bytes using the named charset, storing the result into a new byte array.

如果不给定字符编码，则按照默认的的字符编码进行编码，如果你的eclipse设置默认的编码是UTF-8就按照UTF-8进行编码；在编码之前，这个str变量在内存是以Unicode编码存在的。

new String(str.getBytes(encode), encode)

这个方法是用encode这个参数对str.getBytes(encode)这个字节数组进行解码，其官方的注释如下：

Constructs a new String by decoding the specified array of bytes using the specified charset. The length of the new String is a function of the charset, and hence may not be equal to the length of the byte array.

如果不给定字符编码，则按照默认的的字符编码进行解码，如果你的eclipse设置默认的编码是UTF-8就按照UTF-8进行解码；在解码之后，这个str变量在内存中就以Unicode编码存在的。

为什么又要编码又要解码？

这里我说下自己粗糙的见解，虽然字符串在内存以Unicode编码存在，但是为了有效提高空间利用率，在保存到硬盘或者将其在网络间传输时，会对字符串进行编码，这样可以减少占用的资源；当我们读取这个字符串时，再将之解码到内存中，方便进行各种操作。

在这里，Unicode编码作为各种编码之间的中转。以UTF-8为例：

UTF-8 —> Unicode —> UTF-8

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航