[文章翻译] Software Controls Cache Memory to Speed CPUs

Letting the operating system control cache memory management saves power too



A new process for managing the fast-access memory inside a CPU has led to as much as a twofold speedup and to energy-use reductions of up to 72 percent. According to its designers, realizing such stunning gains requires a big shift in what part of the computer controls this crucial memory: Right now that control is hard-wired into the CPU’s circuitry, but the substantial speedup came when the designers let the operating system handle things instead.



The CPU uses high-speed internal memory caches as a kind of digital staging area. Caches are a CPU’s workbench, whether they’re holding onto instructions a CPU may need soon or data it may need to crunch. And from smartphones to servers, nearly every CPU today manages the flow of bits in and out of its caches using algorithms built into its own circuits.

CPU使用高速中间缓存cache作为一种数据缓冲区域。cache就是CPU的工作台,不论cache中是否存有CPU即将使用的指令或者CPU可能用到的数据。从智能手机到服务器,基于所有的CPU架构当中管理CPU cahce的逻辑都是写入电路固件当中了。

But, say two MIT researchers, as computers and portable devices accumulate more and more memory and CPU cores, it makes less and less sense to leave cache management entirely up to the CPU. Instead, they say, it might be better to let the operating system share the burden.


In itself, this idea is not completely new. Some of IBM’s Cell processors, as well as Sony’s PlayStation 3—which runs on Cell technology—allow their applications and OS kernels to fiddle with low-level CPU memory management. What’s new about the MIT technology, called Jigsaw, is its middle-ground approach, which enables software to configure some on-chip memory caches but without requiring so much control that programming becomes a memory-management nightmare.

这种想法在事实上上并非全新。一些IBM Cell处理器 以及索尼的PlayStation 3,使用了Cell技术来允许应用程序和OS内核来操纵底层的CPU内存管理。而MIT的做法Jigsaw,则是一种折中的做法,它使得程序只需要做对于部分片上缓存进行配置,而不必获取更多权限去完成所有的内存管理工作;

“If you go back six or seven years, you’ll see that everybody was complaining that they launched the PlayStation 3 and nobody could program it well,” says Daniel Sanchez, the assistant professor at MIT’s Computer Science and Artificial Intelligence Laboratory and one of the inventors of Jigsaw.

“如果你倒回6、7年前,你会发现人人都在抱怨当他们载入运行PlayStation 3后没有人能正常进行编程”,出自Daniel Sanchez,MIT计算机系 AI实验室的助理教授,同时也是Jigsaw的作者之一。

Today, CPU hardware typically controls all the on-chip caches. So those caches must be designed to handle every conceivable job, from pure floating-point number crunching (which places a small burden on caches) to intensive searches and queries of a computer’s memory banks (which can stretch their limits). Moreover, CPUs have no higher-level knowledge of the kinds of jobs they’re doing. This means a self-contained numerical simulation with complex equations but little need for memory access would run with exactly the same cache resources as would a graph search, a memory-hogging hunt for relationships between stored data.


So Sanchez and his graduate student Nathan Beckmann thought, Why not let the OS trim the cache size for pure computation and swell its ranks for graph search?

有鉴于此,Sanchez教授和他的学生Nathan Beckmann认为,为什么不利用OS来将分配给纯计算的cache规模进行压缩,而将节省出来的部分补充给耗存图的搜索操作呢?

The first step, they say, would be to give perhaps 1 percent of the CPU’s footprint to a simple piece of hardware that could monitor in real time the cache activity in each core. Hardware cache monitors would give Jigsaw the independent oversight it would need to play air traffic controller with the CPU’s caches.


Second, Sanchez and Beckmann say, the OS’s kernel needs at most a few thousand more lines of code. That’s not much of an addition, considering that Linux’s kernel in 2012 weighed in with 15 million lines and Apple’s and Microsoft’s kernels unofficially contained tens of millions more than that.

在此之后,Sanchez和Beckmann宣称,OS的内核需要至多数万行的代码。that's not much of an addition, 考虑到在2012年时Linux 内核拥有1500万行代码量,而Apple和Microsoft的内核代码量据非官方估计也至少得有数千万行。

One of Jigsaw’s more prominent features is a software module, to be folded in with the OS, that the researchers call Peekahead. This module was adapted from the Lookahead Cache, developed more than a decade ago by Beijing computer scientists. Peekahead computes the best configuration of CPU caches based on the upcoming jobs it expects the cores to do in the coming clock cycles.

Jgisaw当中一个著名的特征是一个名为Peekahead的封装于OS当中的软件模块。该模块由Lookahead Cache来调用。 而前向cache 这项技术则是由Beijing的计算机科学家在十年前发明的。Peekahead是根据在下一时钟周期处理当前即将到来的计算任务的处理核的信息,来对于CPU当中cache进行一个最优的配置。

“When you let software be in charge, you have to be careful of your overhead,” Sanchez says. A poorly designed cache management system, he says, might trim the cache to its optimum size and do it again every fraction of a second. But doing so taxes the CPU. And what’s the point of a CPU efficiency algorithm that requires extraordinary amounts of CPU time? “The exact solution is really expensive. So we have to come up with a quick way of getting the job done so that the overhead doesn’t negate the gains you get,” he says.


Linley Gwennap of the Linley Group, a semiconductor consulting firm based in Mountain View, Calif., says he’s impressed with Jigsaw but cautions that it’s not quite ready for chip-fab prime time. “The problem is generally that a scheme that’s effective on one processor may not be effective on another processor with a different hardware design,” he says. “Every time the processor changes, you have to redo your software, which customers generally don’t like.”

位于Calif Mountain View的半导体咨询公司Linley Gwennap,对于Jigsaw持保留意见,认为它并不适合当前芯片领域;“问题在于这种机制只是在一种处理器上适用,但是未必适合其他种类的处理器。而且一旦处理器的设计改变,你必须重新设计软件,而这样是用户所不愿看到的“。

Sanchez counters that software applications and utilities would remain unaffected by Jigsaw. “Only the operating system code needs to be aware of that intimate knowledge of the hardware, like the topology of the different portions of the cache,” he says.


Jason Mars, an assistant professor of computer science at the University of Michigan, says Jigsaw works well as a proof of concept, which he says chipmakers might adapt as they see fit.

来自UM的围观群众Jason Mars(不是那位I‘m yours的歌手???),认为Jigsaw作为POC是没有问题的,而各个芯片制造商可以考虑是否将其采用该技术;

“The crisp novelty in this work has to do with the codesign between hardware and software,” Mars says. “Much of the prior work was biased in one direction. More was expected to be done in hardware, and there was a little bit less flexibility. Jigsaw really...builds a holistic system that spans both the hardware and the software.”

“这项工作的创新之处在于需要系统设计硬件和软件。”Mars补充说,“大多以往的工作都是偏重于一个方向。更多的是期望在硬件上做功课,但是那种方式缺乏灵活性。 Jigsaw却是做了通盘的考虑,使得该系统可以在软件和硬件两方面进行扩展。”

