您的位置：首页 > 其它

文件系统(三) --block_dev.c file_dev.c char_dev.c源码分析

2015-06-17 14:03 531 查看

Unicode码空间为U+0000到U+10FFFF，一共1114112个码位，其中U+0000 到U+FFFF的部分被称为基本多语言面（Basic Multilingual Plane，BMP）。U+10000及以上的字符称为增补字符。在Java中（Java1.5之后），增补字符使用两个char型变量来表示。第一个char型变量的范围称为“高代理部分”（high-surrogates range,从"uD800到"uDBFF，共1024个码位）, 第二个char型变量的范围称为low-surrogates
range（从"uDC00到"uDFFF，共1024个码位），这样使用surrogate pair可以表示的字符数一共是1024的平方计1048576个，加上BMP的65536个码位，去掉2048个非法的码位，正好是1,112,064个码位。

UTF-16表示的增补字符怎样才能被正确的识别为增补字符，而不是两个普通的字符呢？答案是通过看它的第一个char是不是在高代理范围内，第二个char是不是在低代理范围内来决定，这也意味着，高代理和低代理所占的共2048个码位（从0xD800到0xDFFF）是不能分配给其他字符的。

Unicode的编号中，U+D800到U+DFFF是否有字符分配？答案是也没有！这么做的目的是希望基本多语言面中的字符和一个char型的UTF-16编码的字符能够一一对应。

import java.io.*;
class TestSup
{
public static void main(String[] args) throws IOException
{
int[] codePoints = {0xd801,0xd802,0xdf00,0xdf01,0x34};
String str = new String(codePoints,0,5);
char[] ch = str.toCharArray();
for(char c:ch){
System.out.print(c+"--"+Integer.toHexString(c)+" ");//输出？？？,因为Unicode中不存在这样的char

}
/*测试能否写入文件*/
FileWriter out = new FileWriter("aa");
out.write(ch);
out.close();
System.out.print("\n***********************\n");
FileReader in = new FileReader("aa");
int c;
/**
*对比结果发现非代理范围的字符可以正常写入与读出,但是来自高代理与低代理范围的
*字符无法正常写入，而是被转化为0x3f
*/
while((c = in.read()) != -1){
System.out.print(Integer.toHexString(c)+" ");//为什么是3f?
}
in.close();
System.out.println(str);
}
}

可以得出：如果要向文本文件写入或读出增补字符，只能采用stream的方式读写。读出后根据代理范围进行判断，是否是增补字符（需要考虑编码）。比如是utf-16编码，需要根据高低代理范围进行判断，下面给出utf-32编码的情况下的示例：

import java.util.*;
import java.io.*;
public class Test
{
public static void main(String[] args) throws Exception{
//		System.setProperty("file.encoding","utf-32");
int[] codePoints = {0x100001,0x100002};	//增补字符
String s = new String(codePoints,0,2);

System.out.println("s: " + s);
System.out.println("s.length: " + s.length()); //4,说明length()是按代码单元计算的
System.out.println("s.charAt(0): " + Integer.toHexString((int)s.charAt(0)));//输出结果表明增补字符并非简单地把两个代码单元拆开
System.out.println("s.codePointAt(0):" + Integer.toHexString(s.codePointAt(0)));

char[] ch = s.toCharArray();//按代码单元进行分割

byte[] b = s.getBytes("utf-32");//如果是utf-16,需要根据代理范围

FileOutputStream fos = new FileOutputStream("out2");
fos.write(b,0,b.length);
fos.close();

FileInputStream fis = new FileInputStream("out2");
DataInputStream din = new DataInputStream(fis);
int it;
try{
while((it = din.readInt()) != -1){
if(Character.isSupplementaryCodePoint(it)){
System.out.println("supplement");
}
}
}catch(EOFException e){

}finally{
din.close();
}

}
}

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航