您的位置：首页 > 其它

《现代操作系统4th》英文版阅读笔记 4.3.4章 LFS（the Log-structured File System）系统

2015-01-16 01:10 543 查看

The one parameter that is not improving by leaps and bounds is disk seek time (except for solid-state disks, which have no
seek time).

一个没有大幅提高的参数是磁盘的查找时间seek time,（固态硬盘除外，固态硬盘没有查找时间）

The combination of these factors means that a
performance bottleneck is arising in many file systems. Research done at Berkeley attempted to alleviate this problem by designing a completely new kind of file system, LFS (theLog-structured
File System).

这些因素结合导致在许多文件系统中都会出现性能瓶颈。在Berkeley 实验室的研究工作尝试缓解这个问题，他们设计了一个完全新的文件系统LFS.

To make matters worse, in most file systems, writes are done in very small chunks. Small writes are highly inefficient, since a 50-μsec
disk write is often preceded by a 10-msec seek and a 4-msec rotational delay. With these parameters, disk efficiency drops to a fraction of 1%.

更糟糕的是，在许多文件系统中，写操作是在非常小的块中执行。小的写操作非常低效，因为一个40微秒的磁盘写操作之前要进行10毫秒的查询和4毫秒的旋转延迟。把这些参数计算在内，磁盘效率下降到1%。

To see where all the small writes come from, consider creating a new file on a UNIX system. To write this file, the i-node for the directory,
the directory block, the i-node for the file, and the file itself must all be written. While these writes can
be delayed, doing so exposes the file system to serious consistency problems if a crash occurs before the writes are done. For this reason, the i-node writes are generally done immediately.

想知道这些细小的写操作怎么来的，考虑在UNIX系统中创建一个新文件。为了写这个文件到磁盘，目录的i-node，目录占用的磁盘块，这个文件的i-node，以及这个文件本身都需要写入磁盘。尽管可以延迟这些写入，但是这样做就可能在文件系统中造成严重的一致性问题，比如在写入完成前崩溃发生了。由于这个原因，i-node的写入一般都是直接进行。

From this reasoning, the LFS designers decided to reimplement the UNIX file system in such a way as toachieve
the full bandwidth of the disk, even in the face of a workload consisting in large part of small random writes. The basic idea is to
structure the entire disk as a great big log.

鉴于此，LES设计者决定重新实现UNIX文件系统，即使面对一个负载包含大量的随机小的写入操作，用一个方法来充分利用磁盘的空间。最基本的想法就是把整个磁盘结构化为一个大的日志。

PS "achieve the full bandwidth of
the disk" 怎么翻译更好？

Periodically, and when there is a special need for it, all the pending writes being buffered in memory are collected into a single segment and written
to the disk as a single contiguous segment at the end of the log. A single segment may thus contain i-nodes, directory blocks, and data blocks,
all mixed together. At the start of each segment is a segment summary, telling what can be found in the segment. If the average segment can be made to be about 1 MB, almost the full bandwidth of the disk can be utilized.

定期的，或是有特别需要时，所有在内存中缓存的挂起写操作（的数据）集中到一个segment中，然后作为日志末尾的一个连续segment 写入到磁盘中。由此一个segment 可能包括许多i-node,目录块，数据块，所有这些混在一起。在每一个segment中起始位置是一个segment
summary，告诉这个segment中包含什么数据。如果segment平均大小在1M左右，那么磁盘几乎所有的bandwidth都可以利用到。

In
this design, i-nodes still exist and even have the same structure as in UNIX, but they are now scattered all over the log, instead of being at a fixed position on the disk. Nevertheless, when an i-node
is located, locating the blocks is done in the usual way. Of course, finding an i-node is now much harder, since its address cannot simply be calculated from its i-number, as in UNIX. To make it possible to find
i-nodes, an i-node map, indexed by i-number, is maintained. Entry
i in this map points to i-node ion the disk. The map is kept on disk,
but it is also cached, so the most heavily used parts will be in memory most of the time.

在设计中，I-node仍然存在甚至和UNIX有同样的结构，但是i-node分散在整个log中而不是在磁盘的固定位置。因此，i-node确定后，就可以采用一般的方法来定位磁盘位置。当然，这个设计中要找到i-node更难了，因为i-node的地址不再可以像UNIX那样简单的通过i-node的i-number成员值计算得到。为了能够找到i-nodes，需要维护一个i-node
map,通过i-number进行检索。z

To summarize what we have said so far, all writes are initially buffered in memory, and periodically all the buffered writes are written to the disk in a
single segment, at the end of the log. Opening a file now consists of using the map to locate the i-node for the file. Once the i-node has been located, the addresses of the
blocks can be found from it. All of the blocks will themselves be in segments, somewhere in the log.

现在总结下上面所说的内容，所有的"写入"首先缓存在内容中，然后周期性的把所有缓存的写入数据写入到磁盘中的一个segment中，位于log的末端。现在打开一个文件包括以下步骤：利用imap找到文件的i-node,找到i-node后，就可以找到文件数据所在磁盘块的地址。所有的磁盘块都在segment中，位于log某个位置。
If disks were infinitely large, the above description would be the entire story.
However, real disks are finite, so eventually the log will occupy the entire disk, at which time no new segments can be written to the log. Fortunately, many existing segments
may have blocks that are no longer needed. For example, if a file is overwritten, its i-node will now point to the new blocks, but the old ones will still be occupying space in previously written segments.

如果磁盘无限大，上面的描述就已经够了。

但是，真是的磁盘是有限的，所以最后log会占用整个磁盘，到那个时候新的segment就不能再写入到log中。幸运的是，许多存在的segment占用的磁盘块已经不再使用了。举例来说，如果一个文件被覆写了，它的i-node会指向新的磁盘块，但是之前写入的segments中的老的磁盘块仍然会占据空间。

To
deal with this problem, LFS has a cleaner thread that spends its time scanning the log circularly to compact it. It starts out by reading the summary
of the first segment in the log to see which i-nodes and files are there. It then checks the current i-node map to see if the i-nodes are still current and file blocks are still in use.
If not, that information is discarded. The i-nodes and blocks that are still in use go into memory to be written out in the next segment. The original segment is then marked as free, so that the log
can use it for new data. In this manner, the cleaner moves along the log, removing old segments from the back and putting any live data into memory for rewriting in the next segment. Consequently, the
disk is a big circular buffer, with the writer thread adding new segments to the front and the cleaner thread removing old ones from the back.

为了解决这个问题，LFS采用一个清理线程耗费时间循环的扫描log并进行压缩。首先读取log中第一个segment的summary数据来查看有哪些i-node和文件。然后检测当前的i-node map来判定i-nodes和文件块是否在使用。如果没有在使用，就忽略掉。在使用的i-node和磁盘块就放入内存然后写入到写一个segment.原始的segment标识为free,以此Log可以segment来装载新的数据。利用这个方法，清理线程沿着log文件一直进行下去，从后面去除老的segment，把live（在使用）的数据放进内存写入下一个segment.最后，磁盘变为一个大的循环缓冲区，写入线程在前面添加新的segment，清理线程从后面清楚老的segment.
The bookkeeping here is nontrivial, since when a file block is written back to a new segment, the i-node of the file (somewhere in the log) must be located, updated,
and put into memory to be written out in the next segment. The i-node map must then be updated to point to the new copy. Nevertheless, it is possible to do the administration, and the performance results
show that all this complexity is worthwhile. Measurements given in the papers cited above show that LFS outperforms UNIX by an order of magnitude on small writes, while having a performance that is as good as or better than UNIX
for reads and large writes.

bookkeeping
是非常重要的，因为当一个文件块写入到一个新的segment时，这个文件的i-node必须要找到，然后更新，放进内存最后写入到下一个segment.i-node map也需要更新来指向新的复制数据。不过，实施是可行的，而且性能结果显示了这些增加的复杂度是值得的。在下面引用的文章中显示了在小文件写入时LFS性能超出UNIX非常多，在读写大文件时和UNIX的性能相当。

《现代操作系统4th》英文版下载地址

PS 下篇文件介绍LFS的论文，以深入了解LFS的结构。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： 操作系统磁盘文件系统

相关文章推荐

新的分享

章节导航