ZFS Architecture Description
2015-09-03 22:32
288 查看
ZFS Architecture Description
This page is designed to take you through a brief overview of the ZFS architecture. It is not intended as an introduction to ZFS. We assume that you already have some familiarity with common terms and definitions, as well as a general sense of file system
architecture.
Traditionally, ZFS consists of three main components: ZPL (ZFS POSIX Layer), DMU (Data Management Unit), and SPA (Storage Pool Allocator) as indicated in the above image.
In this picture, you can see the three basic layers, though there are quite a few more elements in each. In addition, we show zvol consumers, as well as the management path, namely zfs(1M) and zpool(1M). You'll find a brief description of all these subsystems
below. This is not intended to be an exhaustive overview of exactly how everything works. We hope that this summary tour is easy to follow. If not, feel free to post to
http://java.net/projects/solaris-zfs/lists.
File System Consumers
These are the basic applications that interact with ZFS solely through the POSIX filesystem APIs. Virtually every application falls into this category. The system calls are passed through the generic OpenSolaris VFS layer to the ZPL.Device Consumers
ZFS provides 'emulated volumes' or volumes or zvols. These volumes are backed by storage from a storage pool, but appear as a normal device under /dev. This is not a typical use case, but there are a small set of cases where this capability is useful. Thereare a small number of applications that interact directly with these devices, but the most common consumer is a kernel filesystem or target driver layered on top of the device.
Management GUI
A web-based ZFS GUI is available in Solaris 10 releases and on the ZFS storage appliance.Management Consumers
These applications manipulate ZFS file systems or storage pools, including examining properties and dataset hierarchy. While there are some scattered exceptions (zoneadm, zoneadmd, fstyp), the two main applications are zpool(1M) and zfs(1M).JNI
This library provides a Java interface to libzfs and is tailored specifically for the GUI. As such, it is geared primarily toward observability, as the GUI performs most actions through the CLI.libzfs
This is the primary interface for management apps to interact with the ZFS kernel module. The library presents a unified, object-based mechanism for accessing and manipulating both storage pools and file systems. The underlying mechanism used to communicatewith the kernel is ioctl(2) calls through /dev/zfs.
ZPL (ZFS POSIX Layer)
The ZPL is the primary interface for interacting with ZFS as a file system. It is a (relatively) thin layer that sits atop the DMU and presents a filesystem abstraction of files and directories. It is responsible for bridging the gap between the VFS interfacesand the underlying DMU interfaces. It is also responsible for enforcing ACL (Access Control List) rules as well as synchronous (O_DSYNC) semantics.
ZVOL (ZFS Emulated Volume)
ZFS includes the ability to present raw devices backed by space from a storage pool. These are known as 'zvols' within the source code, and is implemented by a single file in the ZFS source./dev/zfs
This device is the primary point of control for libzfs. While consumers could consume the ioctl(2) interface directly, it is closely entwined with libzfs, and not a public interface (not that libzfs is, either). It consists of a single file, which does somevalidation on the ioctl() parameters and then vectors the request to the appropriate place within ZFS.
DMU (Data Management Unit)
The DMU is responsible for presenting a transactional object model, built atop the flat address space presented by the SPA. Consumers interact with the DMU via objsets, objects, and transactions. An objset is a collection of objects, where each object isan arbitrary piece of storage from the SPA. Each transaction is a series of operations that must be committed to disk as a group; it is central to the on-disk consistency for ZFS.
DSL (Dataset and Snapshot Layer)
The DSL aggregates DMU objsets into a hierarchical namespace, with inherited properties, as well as quota and reservation enforcement. It is also responsible for managing snapshots and clones of objsets.ZAP (ZFS Attribute Processor)
The ZAP is built atop the DMU, and uses scalable hash algorithms to create arbitrary (name, object) associations within an objset. It is most commonly used to implement directories within the ZPL, but is also used extensively throughout the DSL, as wellas a method of storing pool-wide properties. There are two very different ZAP algorithms, designed for different type of directories. The "micro zap" is used when the number of entries is relatively small and each entry is reasonably short. The "fat zap" is
used for larger directories, or those with extremely long names.
ZIL (ZFS Intent Log)
While ZFS provides always-consistent data on disk, it follows traditional file system semantics where the majority of data is not written to disk immediately; otherwise performance would be pathologically slow. But there are applications that require morestringent semantics where the data is guaranteed to be on disk by the time the read(2) or write(2) call returns. For those applications requiring this behavior (specified with O_DSYNC), the ZIL provides the necessary semantics using an efficient per-dataset
transaction log that can be replayed in event of a crash.
Traversal
Traversal provides a safe, efficient, restartable method of walking all data within a live pool. It forms the basis of resilvering and scrubbing. It walks all metadata looking for blocks modified within a certain period of time. Thanks to the copy-on-writenature of ZFS, this has the benefit of quickly excluding large subtrees that have not been touched during an outage period. It is fundamentally a SPA facility, but has intimate knowledge of some DMU structures in order to handle snapshots, clones, and certain
other characteristics of the on-disk format.
ARC (Adaptive Replacement Cache)
ZFS uses a modified version of an Adaptive Replacement Cache to provide its primary caching needs. This cache is layered between the DMU and the SPA and so acts at the virtual block-level. This allows filesystems to share their cached data with their snapshotsand clones.
Pool Configuration (SPA)
While the entire pool layer is often referred to as the SPA (Storage Pool Allocator), the configuration portion is really the public interface. It is responsible for gluing together the ZIO and vdev layers into a consistent pool object. It includes routinesto create and destroy pools from their configuration information, as well as sync the data out to the vdevs on regular intervals.
ZIO (ZFS I/O Pipeline)
The ZIO pipeline is where all data must pass when going to or from the disk. It is responsible for translation DVAs (Device Virtual Addresses) into logical locations on a vdev, as well as checksumming and compressing data as necessary. It is implementedas a multi-stage pipeline, with a bit mask to control which stage gets executed for each I/O.
VDEV (Virtual Devices)
The virtual device subsystem provides a unified method of arranging and accessing devices. Virtual devices form a tree, with a single root vdev and multiple interior (mirror and RAID-Z) and leaf (disk and file) vdevs. Each vdev is responsible for representingthe available space, as well as laying out blocks on the physical disk.
LDI (Layered Driver Interface)
At the bottom of the stack, ZFS interacts with the underlying physical devices through LDI, the Layered Driver Interface, as well as the VFS interfaces (when dealing with files).
相关文章推荐
- 谱聚类的增量更新原理
- npm的使用技巧 (nodejs 的版本管理 及常用命令 )
- Android test---JUnit
- DMA数据处理采用连续内存
- (linux命令学习)找到相应性质的文件并删除
- GCD队列的注意事项
- 自制51单片机最小系统
- hdu 2871(区间合并)
- 逆向工程核心原理学习笔记2-基址重定位基本原理
- Java冒泡排序法
- 关于phpStudy测试php时遇到解析不了的解决办法
- newssoj1005监听还原 recover(kmp)
- UVA 11491
- Python 集合set
- Quartz.net Cron表达式
- 著名的黑客站点
- 在python3中使用urllib.request编写简单的网络爬虫
- random函数
- BZOJ 1059 矩阵游戏 (二分图最大匹配)
- CF 558 A. Lala Land and Apple Trees