It's all about buffers: zero-copy, mmap and Java NIO
2017-10-23 00:00
417 查看
摘要: v
There are use cases where data need to be read from source to a sink without modification. In code this might look quite simple: for example in Java, you may read data from one
![](https://static.oschina.net/uploads/img/201710/23155305_AXF8.png)
JVM sends read() syscall.
OS context switches to kernel mode and reads data into the input socket buffer.
OS kernel then copies data into user buffer, and context switches back to user mode. read() returns.
JVM processes code logic and sends write() syscall.
OS context switches to kernel mode and copies data from user buffer to output socket buffer.
OS returns to user mode and logic in JVM continues.
This would be fine if latency and throughput aren’t your service’s concern or bottleneck, but it would be annoying if you do care, say for a static asset server. There are 4 context switches and 2 unnecessary copies for the above example.
With that, the diagram would be like this:
![](https://static.oschina.net/uploads/img/201710/23155306_aMnp.png)
You may say OS still has to make a copy of the data in kernel memory space. Yes but from OS’s perspective this is already zero-copy because there’s no data copied from kernel space to user space. The reason why kernel needs to make a copy is because general hardware DMA access expects consecutive memory space (and hence the buffer). However this is avoidable if the hardware supports scatter-n-gather:
![](https://static.oschina.net/uploads/img/201710/23155306_SmDm.png)
A lot of web servers do support zero-copy such as Tomcat and Apache. For example apache’s related doc can be found here but by default it’s off.
Note: Java’s NIO offers this through
![](https://static.oschina.net/uploads/img/201710/23155307_Vq1i.png)
Mmap allows code to map file to kernel memory and access that directly as if it were in the application user space, thus avoiding the unnecessary copy. As a tradeoff, that will still involve 4 context switches. But since OS maps certain chunk of file into memory, you get all benefits from OS virtual memory management - hot content can be intelligently cached efficiently, and all data are page-aligned thus no buffer copying is needed to write stuff back.
However, nothing comes for free - while mmap does avoid that extra copy, it doesn’t guarantee the code will always be faster - depending on the OS implementation, there may be quite a bit of setup and teardown overhead (since it needs to find the space and maintain it in the TLB and make sure to flush it after unmapping) and page fault gets much more expensive since kernel now needs to read from hardware (like disk) to update the memory space and TLB. Hence, if performance is this critical, benchmark is always needed as abusing mmap() may yield worse performance than simply doing the copy.
The corresponding class in Java is
This is used when
Used when
Used when
Getting started with new I/O (NIO)
Sep 10, 2016 in OS
There are use cases where data need to be read from source to a sink without modification. In code this might look quite simple: for example in Java, you may read data from one
InputStreamchunk by chunk into a small buffer (typically 8KB), and feed them into the
OutputStream, or even better, you could create a
PipedInputStream, which is basically just a util that maintains that buffer for you. However, if low latency is crucial to your software, this might be quite expensive from the OS perspective and I shall explain.
What happens under the hood
Well, here’s what happens when the above code is used:![](https://static.oschina.net/uploads/img/201710/23155305_AXF8.png)
JVM sends read() syscall.
OS context switches to kernel mode and reads data into the input socket buffer.
OS kernel then copies data into user buffer, and context switches back to user mode. read() returns.
JVM processes code logic and sends write() syscall.
OS context switches to kernel mode and copies data from user buffer to output socket buffer.
OS returns to user mode and logic in JVM continues.
This would be fine if latency and throughput aren’t your service’s concern or bottleneck, but it would be annoying if you do care, say for a static asset server. There are 4 context switches and 2 unnecessary copies for the above example.
OS-level zero copy for the rescue
Clearly in this use case, the copy from/to user space memory is totally unnecessary because we didn’t do anything other than dumping data to a different socket. Zero copy can thus be used here to save the 2 extra copies. The actual implementation doesn’t really have a standard and is up to the OS how to achieve that. Typically *nix systems will offersendfile(). Its man page can be found here. Some say some operating systems have broken versions of that with one of them being OSX link. Honestly with such low-level feature, I wouldn’t trust Apple’s BSD-like system so never tested there.
With that, the diagram would be like this:
![](https://static.oschina.net/uploads/img/201710/23155306_aMnp.png)
You may say OS still has to make a copy of the data in kernel memory space. Yes but from OS’s perspective this is already zero-copy because there’s no data copied from kernel space to user space. The reason why kernel needs to make a copy is because general hardware DMA access expects consecutive memory space (and hence the buffer). However this is avoidable if the hardware supports scatter-n-gather:
![](https://static.oschina.net/uploads/img/201710/23155306_SmDm.png)
A lot of web servers do support zero-copy such as Tomcat and Apache. For example apache’s related doc can be found here but by default it’s off.
Note: Java’s NIO offers this through
transferTo(doc).
mmap
The problem with the above zero-copy approach is that because there’s no user mode actually involved, code cannot do anything other than piping the stream. However, there’s a more expensive yet more useful approach - mmap, short for memory-map.![](https://static.oschina.net/uploads/img/201710/23155307_Vq1i.png)
Mmap allows code to map file to kernel memory and access that directly as if it were in the application user space, thus avoiding the unnecessary copy. As a tradeoff, that will still involve 4 context switches. But since OS maps certain chunk of file into memory, you get all benefits from OS virtual memory management - hot content can be intelligently cached efficiently, and all data are page-aligned thus no buffer copying is needed to write stuff back.
However, nothing comes for free - while mmap does avoid that extra copy, it doesn’t guarantee the code will always be faster - depending on the OS implementation, there may be quite a bit of setup and teardown overhead (since it needs to find the space and maintain it in the TLB and make sure to flush it after unmapping) and page fault gets much more expensive since kernel now needs to read from hardware (like disk) to update the memory space and TLB. Hence, if performance is this critical, benchmark is always needed as abusing mmap() may yield worse performance than simply doing the copy.
The corresponding class in Java is
MappedByteBufferfrom NIO package. It’s actually a variation of
DirectByteBufferthough there’s no direct relationship between classes. The actual usage is out of scope of this post.
NIO DirectByteBuffer
Java NIO introducesByteBufferwhich represents the buffer area used for channels. There are 3 main implementations of
ByteBuffer:
HeapByteBuffer
This is used when
ByteBuffer.allocate()is called. It’s called heap because it’s maintained in JVM’s heap space and hence you get all benefits like GC support and caching optimization. However, it’s not page aligned, which means if you need to talk to native code through JNI, JVM would have to make a copy to the aligned buffer space.
DirectByteBuffer
Used when
ByteBuffer.allocateDirect()is called. JVM will allocate memory space outside the heap space using
malloc(). Because it’s not managed by JVM, your memory space is page-aligned and not subject to GC, which makes it perfect candidate for working with native code (e.g. when writing OpenGL stuff). However, you are then “deteriorated” to C programmer as you’ll have to allocate and deallocate memory yourself to prevent memory leak.
MappedByteBuffer
Used when
FileChannel.map()is called. Similar to
DirectByteBufferthis is also outside of JVM heap. It essentially functions as a wrapper around OS mmap() system call in order for code to directly manipulate mapped physical memory data.
Conclusion
sendfile()and
mmap()offer efficient, low-latency low-level solutions to data manipulation across sockets. Again, no code should assume these are silver bullets as real world scenarios may be complex and it might not be worth the effort to switch code to them if this is not the true bottleneck. For software engineering to get the most ROI, in most cases, it’s better to “make it right” and then “make it fast”. Without the guardrails offered by JVM, it’s easy to make software much more vulnerable to crashing (I literally mean crashing, not exceptions) when it comes to complicated logic.
Quick Reference
Efficient data transfer through zero copy - It also covers sendfile() performance comparison.Getting started with new I/O (NIO)
相关文章推荐
- All you ever wanted to know about Workflow and how it relates to Java, Transactions and Concurrency
- java-nio之zero copy深入分析
- zz - It's All About the SynchronizationContext
- Physical and Logical Block Corruptions. All you wanted to know about it. (Doc ID 840978.1)
- Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken
- Physical and Logical Block Corruptions. All you wanted to know about it. (文档 ID 840978.1)
- Lock-less and zero copy messaging scheme for telecommunication network applications
- How to copy all view private files and Derived objects between views
- atitit.人脸识别的应用场景and使用最佳实践 java .net php
- "Android SDK and AVD Manager" cannot be made visible because all of its children are in unavailable
- 使用drozer连接时提示:Could not find java. Please ensure that it is installed and on your path
- 【Java多线程】之五:wait, notify and notifyAll
- java.lang.Throwable and its descendant: Error and Exception
- Moving git repository and all its branches, tags to a new remote repository keeping commits history
- javadataAbout stack and heap in JAVA(2)
- Task cancellation in C# and things you should know about it
- JavaNIO之缓冲区(Buffers)
- If a class has a pointer member, it needs copy operations (copy constructor and copy assignment); sec10.4.4.1.
- All about oracle smallfile and bigfile tablespace
- Netty and Java NIO APIs