您的位置：首页 > 编程语言 > Java开发

Java I/O 扩展

2016-01-16 09:18 316 查看

Java I/O 扩展

标签： Java基础

NIO

Java 的

NIO

(新IO)和传统的IO有着相同的目的:

输入

输出

.但是NIO使用了不同的方式来处理IO,NIO利用

内存映射文件

(此处文件的含义可以参考Unix的名言

一切皆文件

)来处理IO, NIO将文件或文件的一段区域映射到内存中(类似于操作系统的虚拟内存),这样就可以像访问内存一样来访问文件了.

Channel

和

Buffer

是NIO中的两个核心概念:

Channel

是对传统的IO系统的模拟,在NIO系统中所有的数据都需要通过

Channel

传输;

Channel

与传统的

InputStream

OutputStream

最大的区别在于它提供了一个

map()

方法,可以直接将一块数据映射到内存中.如果说传统的IO系统是面向流的处理, 则NIO则是面向

块

的处理;

Buffer

可以被理解成一个容器, 他的本质是一个数组; Buffer作为Channel与程序的中间层, 存入到

Channel

中的所有对象都必须首先放到

Buffer

中(

Buffer

Channel

), 而从

Channel

中读取的数据也必须先放到

Buffer

中(

Channel

Buffer

Buffer

从原理来看,

java.nio.ByteBuffer

就像一个数组,他可以保存多个类型相同的数据.

Buffer

只是一个抽象类,对应每种基本数据类型(boolean除外)都有相应的Buffer类:

CharBuffer

ShortBuffer

ByteBuffer

等.

这些Buffer除了

ByteBuffer

之外, 都采用相同或相似的方法来管理数据, 只是各自管理的数据类型不同而已.这些Buffer类都没有提供构造器, 可以通过如下方法来得到一个Buffer对象.

// Allocates a new buffer.
static XxxBuffer allocate(int capacity);

其中

ByteBuffer

还有一个子类

MappedByteBuffer

,它表示

Channel

将磁盘文件全部映射到内存中后得到的结果, 通常

MappedByteBuffer

由

Channel

的

map()

方法返回.

Buffer中的几个概念:

capacity: 该Buffer的最大数据容量;

limit: 第一个不应该被读出/写入的缓冲区索引;

position: 指明下一个可以被读出/写入的缓冲区索引;

mark: Buffer允许直接将position定位到该mark处.

0 <= mark <= position <= limit <= capacity

Buffer中常用的方法:


方法	解释
int capacity()	Returns this buffer’s capacity.
int remaining()	Returns the number of elements between the current position and the limit.
int limit()	Returns this buffer’s limit.
int position()	Returns this buffer’s position.
Buffer position(int newPosition)	Sets this buffer’s position.
Buffer reset()	Resets this buffer’s position to the previously-marked position.
Buffer clear()	Clears this buffer.(并不是真的清空, 而是为下一次插入数据做好准备
Buffer flip()	Flips this buffer.(将数据封存 ,为读取数据做好准备)

除了这些在

Buffer

基类中存在的方法之外, Buffer的所有子类还提供了两个重要的方法:

put()

: 向Buffer中放入数据

get()

: 从Buffer中取数据

当使用put/get方法放入/取出数据时, Buffer既支持单个数据的访问, 也支持(以数组为参数)批量数据的访问.而且当使用put/get方法访问Buffer的数据时, 也可分为相对和绝对两种:

相对

: 从Buffer的当前position处开始读取/写入数据, position按处理元素个数后移.

绝对

: 直接根据索引读取/写入数据, position不变.

/**
* @author jifang
* @since 16/1/9下午8:31.
*/
public class BufferTest {

@Test
public void client() {
ByteBuffer buffer = ByteBuffer.allocate(64);
displayBufferInfo(buffer, "init");

buffer.put((byte) 'a');
buffer.put((byte) 'b');
buffer.put((byte) 'c');
displayBufferInfo(buffer, "after put");

buffer.flip();
displayBufferInfo(buffer, "after flip");
System.out.println((char) buffer.get());
displayBufferInfo(buffer, "after a get");

buffer.clear();
displayBufferInfo(buffer, "after clear");
// 依然可以访问到数据
System.out.println((char) buffer.get(2));
}

private void displayBufferInfo(Buffer buffer, String msg) {
System.out.println("---------" + msg + "-----------");
System.out.println("position: " + buffer.position());
System.out.println("limit: " + buffer.limit());
System.out.println("capacity: " + buffer.capacity());
}
}

通过

allocate()

方法创建的Buffer对象是普通Buffer,

ByteBuffer

还提供了一个

allocateDirect()

方法来创建

DirectByteBuffer

DirectByteBuffer

的创建成本比普通Buffer要高, 但

DirectByteBuffer

的读取效率也会更高.所以

DirectByteBuffer

适用于生存期比较长的Buffer.

只有

ByteBuffer

才提供了

allocateDirect(int capacity)

方法, 所以只能在

ByteBuffer

级别上创建

DirectByteBuffer

, 如果希望使用其他类型, 则可以将Buffer转换成其他类型的Buffer.

Channel

像上面这样使用

Buffer

感觉是完全没有诱惑力的(就一个数组嘛,还整得这么麻烦⊙﹏⊙b).其实

Buffer

真正的强大之处在于与

Channel

的结合,从

Channel

中直接映射一块内存进来,而没有必要一一的get/put.

java.nio.channels.Channel

类似于传统的流对象, 但与传统的流对象有以下两个区别:

Channel

可以直接将指定文件的部分或者全部映射成

Buffer

程序不能直接访问

Channel

中的数据, 必须要经过

Buffer

作为中间层.

Java为Channel接口提供了

FileChannel

DatagramChannel

Pipe.SinkChannel

Pipe.SourceChannel

SelectableChannel

SocketChannel

ServerSocketChannel

. 所有的

Channel

都不应该通过构造器来直接创建, 而是通过传统的

InputStream

OutputStream

的

getChannel()

方法来返回对应的

Channel

, 当然不同的节点流获得的

Channel

不一样. 例如,

FileInputStream

FileOutputStream

返回的是

FileChannel

PipedInputStream

PipedOutputStream

返回的是

Pipe.SourceChannel

Pipe.SinkChannel

;

Channel

中最常用的三个方法是

MappedByteBuffer    map(FileChannel.MapMode mode, long position, long size)

read()

write()

, 其中

map()

用于将Channel对应的部分或全部数据映射成

ByteBuffer

, 而read/write有一系列的重载形式, 用于从Buffer中读写数据.

/**
* @author jifang
* @since 16/1/9下午10:55.
*/
public class ChannelTest {
private CharsetDecoder decoder = Charset.forName("utf-8").newDecoder();

@Test
public void client() throws IOException {
try (FileChannel inChannel = new FileInputStream("save.txt").getChannel();
FileChannel outChannel = new FileOutputStream("attach.txt").getChannel()) {
MappedByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0,
new File("save.txt").length());
displayBufferInfo(buffer, "init buffer");

// 将Buffer内容一次写入另一文件的Channel
outChannel.write(buffer);
buffer.flip();
// 解码CharBuffer之后输出
System.out.println(decoder.decode(buffer));
}
}

// ...
}

Charset

Java从1.4开始提供了

java.nio.charset.Charset

来处理字节序列和字符序列(字符串)之间的转换, 该类包含了用于创建解码器和编码器的方法, 需要注意的是,

Charset

类是不可变类.

Charset

提供了

availableCharsets()

静态方法来获取当前JDK所支持的所有字符集.

/**
* @author jifang
* @since 16/1/10下午4:32.
*/
public class CharsetLearn {

@Test
public void testGetAllCharsets() {
SortedMap<String, Charset> charsetMap = Charset.availableCharsets();
for (Map.Entry<String, Charset> charset : charsetMap.entrySet()) {
System.out.println(charset.getKey() + " aliases -> " + charset.getValue().aliases() + " chaset -> " + charset.getValue());
}
}
}

执行上面代码可以看到每个字符集都有一些字符串别名(比如

UTF-8

还有

unicode-1-1-utf-8

UTF8

的别名), 一旦知道了字符串的别名之后, 程序就可以调用Charset的

forName()

方法来创建对应的Charset对象:

@Test
public void testGetCharset() {
Charset utf8 = Charset.forName("UTF-8");
Charset unicode11 = Charset.forName("unicode-1-1-utf-8");
System.out.println(utf8.name());
System.out.println(unicode11.name());
System.out.println(unicode11 == utf8);
}

在Java 1.7 之后, JDK又提供了一个工具类

StandardCharsets

, 里面提供了一些静态属性来表示标准的常用字符集:

@Test
public void testGetCharset() {
// 使用UTF-8属性
Charset utf8 = StandardCharsets.UTF_8;
Charset unicode11 = Charset.forName("unicode-1-1-utf-8");
System.out.println(utf8.name());
System.out.println(unicode11.name());
System.out.println(unicode11 == utf8);
}

获得了

Charset

对象之后,就可以使用

decode()

encode()

方法来对

ByteBuffer

CharBuffer

进行编码/解码了


方法	功能
ByteBuffer encode(CharBuffer cb)	Convenience method that encodes Unicode characters into bytes in this charset.
ByteBuffer encode(String str)	Convenience method that encodes a string into bytes in this charset.
CharBuffer decode(ByteBuffer bb)	Convenience method that decodes bytes in this charset into Unicode characters.

或者也可以通过

Charset

对象的

newDecoder()

newEncoder()

来获取

CharsetDecoder

解码器和

CharsetEncoder

编码器来完成更加灵活的编码/解码操作(他们肯定也提供了

encode

和

decode

方法).

@Test
public void testDecodeEncode() throws IOException {
File inFile = new File("save.txt");
FileChannel in = new FileInputStream(inFile).getChannel();

MappedByteBuffer byteBuffer = in.map(FileChannel.MapMode.READ_ONLY, 0, inFile.length());
// Charset utf8 = Charset.forName("UTF-8");
Charset utf8 = StandardCharsets.UTF_8;

// 解码
// CharBuffer charBuffer = utf8.decode(byteBuffer);
CharBuffer charBuffer = utf8.newDecoder().decode(byteBuffer);
System.out.println(charBuffer);

// 编码
// ByteBuffer encoded = utf8.encode(charBuffer);
ByteBuffer encoded = utf8.newEncoder().encode(charBuffer);
byte[] bytes = new byte[(int) inFile.length()];
encoded.get(bytes);
for (int i = 0; i < bytes.length; ++i) {
System.out.print(bytes[i]);
}
System.out.println();

}

String类里面也提供了一个

getBytes(String charset)

方法来使用指定的字符集将字符串转换成字节序列.

使用

WatchService

监控文件变化

在以前的Java版本中,如果程序需要监控文件系统的变化,则可以考虑启动一条后台线程,这条后台线程每隔一段时间去遍历一次指定目录的文件,如果发现此次遍历的结果与上次不同,则认为文件发生了变化. 但在后来的NIO.2中,

Path

类提供了

register

方法来监听文件系统的变化.

WatchKey    register(WatchService watcher, WatchEvent.Kind<?>... events);
WatchKey    register(WatchService watcher, WatchEvent.Kind<?>[] events, WatchEvent.Modifier... modifiers);

其实是

Path

实现了

Watchable

接口,

register

是

Watchable

提供的方法.

WatchService

代表一个文件系统监听服务, 它负责监听

Path

目录下的文件变化.而

WatchService

是一个接口, 需要由

FileSystem

的实例来创建, 我们往往这样获取一个

WatchService

WatchService service = FileSystems.getDefault().newWatchService();

一旦

register

方法完成注册之后, 接下来就可调用

WatchService

的如下方法来获取被监听的目录的文件变化事件:


方法	释义
WatchKey poll()	Retrieves and removes the next watch key, or null if none are present.
WatchKey poll(long timeout, TimeUnit unit)	Retrieves and removes the next watch key, waiting if necessary up to the specified wait time if none are yet present.
WatchKey take()	Retrieves and removes next watch key, waiting if none are yet present.

获取到

WatchKey

之后, 就可调用其方法来查看到底发生了什么事件, 得到

WatchEvent


方法	释义
List<WatchEvent<?>> pollEvents()	Retrieves and removes all pending events for this watch key, returning a List of the events that were retrieved.
boolean reset()	Resets this watch key.

WatchEvent


方法	释义
T context()	Returns the context for the event.
int count()	Returns the event count.
WatchEvent.Kind<T> kind()	Returns the event kind.

/**
* @author jifang
* @since 16/1/10下午8:00.
*/
public class ChangeWatcher {

public static void main(String[] args) {
watch("/Users/jifang/");
}

public static void watch(String directory) {
try {
WatchService service = FileSystems.getDefault().newWatchService();
Paths.get(directory).register(service,
StandardWatchEventKinds.ENTRY_CREATE,
StandardWatchEventKinds.ENTRY_DELETE,
StandardWatchEventKinds.ENTRY_MODIFY);
while (true) {
WatchKey key = service.take();
for (WatchEvent event : key.pollEvents()) {
System.out.println(event.context() + " 文件发生了 " + event.kind() + " 事件!");
}

if (!key.reset()) {
break;
}
}
} catch (IOException | InterruptedException e) {
throw new RuntimeException(e);
}
}
}

通过使用

WatchService

, 可以非常优雅的监控指定目录下的文件变化, 至于文件发生变化后的处理, 就取决于业务需求了, 比如我们可以做一个日志分析器, 定时去扫描日志目录, 查看日志大小是否改变, 当发生改变时候, 就扫描发生改变的部分, 如果发现日志中有异常产生(比如有Exception/Timeout类似的关键字存在), 就把这段异常信息截取下来, 发邮件/短信给管理员.

Guava IO

平时开发中常用的IO框架有Apache的

commons-io

和Google

Guava

的IO模块; 不过Apache的

commons-io

包比较老,更新比较缓慢(最新的包还是2012年的); 而Guava则更新相对频繁, 最近刚刚发布了19.0版本, 因此在这儿仅介绍Guava对Java IO的扩展.

使用Guava需要在

pom.xml

中添加如下依赖:

<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>19.0</version>
</dependency>

最近我在写一个网页图片抓取工具时, 最开始使用的是Java的

URL.openConnection()

IOStream

操作来实现, 代码非常繁琐且性能不高(详细代码可类似参考java 使用URL来读取网页内容). 而使用了Guava之后几行代码就搞定了网页的下载功能:

public static String getHtml(String url) {
if (StringUtils.isBlank(url)) {
return null;
}
try {
return Resources.toString(new URL(url), StandardCharsets.UTF_8);
} catch (IOException e) {
LOGGER.error("getHtml error url = {}", url, e);
throw new RuntimeException(e);
}
}

代码清晰多了.

还可以使用

Resources

类的

readLines(URL url, Charset charset, LineProcessor<T> callback)

方法来实现只抓取特定的网页内容的功能:

public static List<String> processUrl(String url, final String regexp) {
try {
return Resources.readLines(new URL(url), StandardCharsets.UTF_8, new LineProcessor<List<String>>() {

private Pattern pattern = Pattern.compile(regexp);
private List<String> strings = new ArrayList<>();

@Override
public boolean processLine(String line) throws IOException {
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
strings.add(matcher.group());
}
return true;
}

@Override
public List<String> getResult() {
return strings;
}
});
} catch (IOException e) {
LOGGER.error("processUrl error, url = {}, regexp = {}", url, regexp, e);
throw new RuntimeException(e);
}
}

而性能的话, 我记得有这么一句话来评论STL的

STL性能可能不是最高的, 但绝对不是最差的!

我认为这句话同样适用于Guava; 在Guava IO中, 有三类操作是比较常用的:

对Java传统的IO操作的简化;

Guava对

源

与

汇

的支持;

Guava

Files

Resources

对文件/资源的支持;

Java IO 简化

在Guava中,用

InputStream/OutputStream

Readable/Appendable

来对应Java中的字节流和字符流(

Writer

实现了

Appendable

接口,

Reader

实现了

Readable

接口).并用

com.google.common.io.ByteStreams

和

com.google.common.io.CharStreams

来提供对传统IO的支持.

这两个类中, 实现了很多static方法来简化Java IO操作,如:

static long  copy(Readable/InputStream from, Appendable/OutputStream to)

static byte[]    toByteArray(InputStream in)

static int   read(InputStream in, byte[] b, int off, int len)

static ByteArrayDataInput    newDataInput(byte[] bytes, int start)

static String    toString(Readable r)

/**
* 一行代码读取文件内容
*
* @throws IOException
*/
@Test
public void getFileContent() throws IOException {
FileReader reader = new FileReader("save.txt");
System.out.println(CharStreams.toString(reader));
}

关于

ByteStreams

和

CharStreams

的详细介绍请参考Guava文档

Guava源与汇

Guava提出源与汇的概念以避免总是直接跟流打交道.

源与汇是指某个你知道如何从中打开流的资源,如File或URL.

源是可读的，汇是可写的.

Guava的源有

ByteSource

和

CharSource

; 汇有

ByteSink

CharSink

源与汇的好处是它们提供了一组通用的操作(如:一旦你把数据源包装成了ByteSource,无论它原先的类型是什么,你都得到了一组按字节操作的方法). 其实就源与汇就类似于Java IO中的

InputStream/OutputStream

Reader/Writer

. 只要能够获取到他们或者他们的子类, 就可以使用他们提供的操作, 不管底层实现如何.

/**
* @author jifang
* @since 16/1/11下午4:39.
*/
public class SourceSinkTest {

@Test
public void fileSinkSource() throws IOException {
File file = new File("save.txt");
CharSink sink = Files.asCharSink(file, StandardCharsets.UTF_8);
sink.write("- 你好吗?\n- 我很好.");

CharSource source = Files.asCharSource(file, StandardCharsets.UTF_8);
System.out.println(source.read());
}

@Test
public void netSource() throws IOException {
CharSource source = Resources.asCharSource(new URL("http://www.sun.com"), StandardCharsets.UTF_8);
System.out.println(source.readFirstLine());
}
}

获取源与汇

获取字节源与汇的常用方法有:


字节源	字节汇
Files.asByteSource(File)	Files.asByteSink(File file, FileWriteMode... modes)
Resources.asByteSource(URL url)	-
ByteSource.wrap(byte[] b)	-
ByteSource.concat(ByteSource... sources)	-

获取字符源与汇的常用方法有:


字符源	字符汇
Files.asCharSource(File file, Charset charset)	Files.asCharSink(File file, Charset charset, FileWriteMode... modes)
Resources.asCharSource(URL url, Charset charset)	-
CharSource.wrap(CharSequence charSequence)	-
CharSource.concat(CharSource... sources)	-
ByteSource.asCharSource(Charset charset)	ByteSink.asCharSink(Charset charset)

使用源与汇

这四个源与汇提供通用的方法进行读/写, 用法与Java IO类似,但比Java IO流会更加简单方便(如

CharSource

可以一次性将源中的数据全部读出

String    read()

, 也可以将源中的数据一次拷贝到Writer或汇中

long copyTo(CharSink/Appendable to)

)

@Test
public void saveHtmlFileChar() throws IOException {
CharSource source = Resources.asCharSource(new URL("http://www.google.com"), StandardCharsets.UTF_8);
source.copyTo(Files.asCharSink(new File("save1.html"), StandardCharsets.UTF_8));
}

@Test
public void saveHtmlFileByte() throws IOException {
ByteSource source = Resources.asByteSource(new URL("http://www.google.com"));
//source.copyTo(new FileOutputStream("save2.html"));
source.copyTo(Files.asByteSink(new File("save2.html")));
}

其他详细用法请参考Guava文档

Files与Resources

上面看到了使用

Files

与

Resources

将

URL

和

File

转换成

ByteSource

与

CharSource

的用法,其实这两个类还提供了很多方法来简化IO, 详细请参考Guava文档

Resources

常用方法


Resources 方法	释义
static void copy(URL from, OutputStream to)	Copies all bytes from a URL to an output stream.
static URL getResource(String resourceName)	Returns a URL pointing to resourceName if the resource is found using the context class loader.
static List<String> readLines(URL url, Charset charset)	Reads all of the lines from a URL.
static <T> T readLines(URL url, Charset charset, LineProcessor<T> callback)	Streams lines from a URL, stopping when our callback returns false, or we have read all of the lines.
static byte[] toByteArray(URL url)	Reads all bytes from a URL into a byte array.
static String toString(URL url, Charset charset)	Reads all characters from a URL into a String, using the given character set.

Files

常用方法


Files 方法	释义
static void append(CharSequence from, File to, Charset charset)	Appends a character sequence (such as a string) to a file using the given character set.
static void copy(File from, Charset charset, Appendable to)	Copies all characters from a file to an appendable object, using the given character set.
static void copy(File from, File to)	Copies all the bytes from one file to another.
static void copy(File from, OutputStream to)	Copies all bytes from a file to an output stream.
static File createTempDir()	Atomically creates a new directory somewhere beneath the system’s temporary directory (as defined by the java.io.tmpdir system property), and returns its name.
static MappedByteBuffer map(File file, FileChannel.MapMode mode, long size)	Maps a file in to memory as per FileChannel.map(java.nio.channels.FileChannel.MapMode, long, long) using the requested FileChannel.MapMode.
static void move(File from, File to)	Moves a file from one path to another.
static <T> T readBytes(File file, ByteProcessor<T> processor)	Process the bytes of a file.
static String readFirstLine(File file, Charset charset)	Reads the first line from a file.
static List<String> readLines(File file, Charset charset)	Reads all of the lines from a file.
static <T> T readLines(File file, Charset charset, LineProcessor<T> callback)	Streams lines from a File, stopping when our callback returns false, or we have read all of the lines.
static byte[] toByteArray(File file)	Reads all bytes from a file into a byte array.
static String toString(File file, Charset charset)	Reads all characters from a file into a String, using the given character set.
static void touch(File file)	Creates an empty file or updates the last updated timestamp on the same as the unix command of the same name.
static void write(byte[] from, File to)	Overwrites a file with the contents of a byte array.
static void write(CharSequence from, File to, Charset charset)	Writes a character sequence (such as a string) to a file using the given character set.

参考:
Google Guava官方教程（中文版）

Google Guava官方文档

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航

Java I/O 扩展

Java I/O 扩展

NIO

Buffer

Channel

Charset

使用WatchService监控文件变化

Guava IO

Java IO 简化

Guava源与汇

获取源与汇

使用源与汇

Files与Resources

使用
WatchService
监控文件变化