Better Linux Disk Caching & Performance with vm.dirty_ratio & vm.dirty_background_ratio
2015-12-05 14:21
447 查看
by BOB PLANKERS on DECEMBER 22, 2013
in BEST PRACTICES,CLOUD,SYSTEM
ADMINISTRATION,VIRTUALIZATION
This is post #16 in my December 2013 series about Linux Virtual MachinePerformance Tuning. For more, please see the tag “Linux
VM Performance Tuning.”
In previous posts on vm.swappiness and using
RAM disks wetalked about how the memory on a Linux guest is used for the OS itself (thekernel, buffers, etc.), applications, and also for file cache.
File caching is an important performance improvement, and read caching is a clear win in most cases, balanced againstapplications using the RAM directly. Write caching is
trickier(棘手). The Linux kernel
stages(使…分阶段的)disk writes into cache, and over timeasynchronously flushes them to disk. This has a
nice effect of speeding diskI/O but it is risky. When data isn’t written to disk there is an increasedchance of losing it.
There is also the chance that a lot of I/O will
overwhelm(压倒) the cache, too. Ever written a lot ofdata to disk all at once, and seen large pauses on the system while it tries todeal with all that data? Those
pauses are a result of the cache deciding thatthere’s too much data to be written asynchronously (as a non-blockingbackground operation, letting the application process continue), and switchesto writing synchronously (blocking and making the process wait until
the I/O iscommitted to disk). Of course, a filesystem also has to preserve writeorder, so when it starts writing synchronously it first has to destage thecache. Hence the long pause.
The nice thing is that these are controllable options, and based on yourworkloads & data you can decide how you want to set them up. Let’s take alook:
$ sysctl -a | grep dirty vm.dirty_background_ratio = 10vm.dirty_background_bytes = 0 vm.dirty_ratio = 20 vm.dirty_bytes = 0vm.dirty_writeback_centisecs = 500 vm.dirty_expire_centisecs = 3000
vm.dirty_background_ratio is thepercentage of system memory that can be filled with “dirty” pages —
memory pages that still need to be written to disk —
before the pdflush/flush/kdmflush background processes kick in to write itto disk. My example is 10%, so if my virtual server has 32 GB of memory that’s3.2 GB of data that can be sitting in RAM
before something is done.
vm.dirty_ratio is the absolute maximum amount ofsystem memory that can be filled with dirty pages before everything must getcommitted to disk.
When the system gets to this point all newI/O
blocks until dirty pages havebeen written to disk. This is often the source of long I/Opauses, but is a safeguard against too much data being cached unsafely inmemory.
vm.dirty_background_bytes and vm.dirty_bytes are another way to specify theseparameters.
If you set the _bytes version the _ratio version will become 0, andvice-versa.
vm.dirty_expire_centisecs is
how long something canbe in cache before it needs to be written. In this case it’s 30 seconds.
When the pdflush/flush/kdmflush processes kick in they will check to seehow old a dirty page is, and if it’s older than this value it’ll be writtenasynchronously to disk. Since holding a dirty
page in memoryis unsafe this is also a safeguard against data loss.
vm.dirty_writeback_centisecs is
how often thepdflush/flush/kdmflush processes wake up and check to see if work needs to bedone.
You can also see statistics on the page cache in /proc/vmstat:
$ cat /proc/vmstat | egrep "dirty|writeback"
nr_dirty 878 nr_writeback 0 nr_writeback_temp 0
In my case I have 878 dirty pages waiting to be written to disk.
Approach 1: Decreasing the Cache
As with most things in the computer world, how you adjust these depends onwhat you’re trying to do. In many cases we have fast disk subsystems with theirown big, battery-backed NVRAM caches, so keeping things in the
OS page cache isrisky. Let’s try to send I/O to the array in a more timely fashion and reducethe chance our local OS will, to borrow a
phrase(短语) from theservice industry, be “in
the weeds(杂草).”
To do this we lowervm.dirty_background_ratio and vm.dirty_ratio by adding new numbers to/etc/sysctl.conf and reloading with “sysctl –p”:
vm.dirty_background_ratio = 5 vm.dirty_ratio = 10
This is a typical approach on virtual machines, as well as Linux-based
hypervisors(虚拟机管理程序). Iwouldn’t suggest setting these parameters to zero, as some background I/O isnice to decouple application performance from short periods of
higher latencyon your disk array & SAN (“spikes”).
Approach 2: Increasing the Cache
There are scenarios where raising the cache dramatically has positiveeffects on performance. These situations are where the data contained on aLinux guest isn’t critical and can be lost, and usually where an application
iswriting to the same files repeatedly or in repeatable bursts(连发). In theory, by allowing more dirtypages to exist in memory you’ll rewrite the same blocks over
and over in cache,and just need to do one write every so often to the actual disk. To do this weraise the parameters:
vm.dirty_background_ratio = 50 vm.dirty_ratio = 80
Sometimes folks also increase the
vm.dirty_expire_centisecs parameter to allow more time in cache. Beyond the increased risk of dataloss, you also run the risk of long I/O pauses if that cache gets full andneeds to destage, because
on large VMs there will be a lot of data in cache.
Approach 3: Both Ways
There are also scenarios where a system has to deal with infrequent,bursty traffic to slow disk (batch jobs at the top of the hour, midnight,writing to an SD card on a Raspberry Pi, etc.). In that case an approach
mightbe to allow all that write I/O to be deposited in the cache so that thebackground flush operations can deal with it asynchronously over time:
vm.dirty_background_ratio = 5 vm.dirty_ratio = 80
Here the background processes will start writing right away when it hitsthat 5% ceiling but the system won’t force synchronous I/O until it gets to 80%full. From there you just size your system RAM and vm.dirty_ratio
to be able toconsume all the written data. Again, there are tradeoffs with data consistencyon disk, which translates into risk to data. Buy a UPS and make sure you candestage cache before the UPS runs out of power. :)
No matter the route you choose you should always be gathering hard data tosupport your changes and help you determine if you are improving things ormaking them worse. In this case you can get data from many different
places,including the application itself, /proc/vmstat, /proc/meminfo, iostat, vmstat,and many of the things in /proc/sys/vm. Good luck!
in BEST PRACTICES,CLOUD,SYSTEM
ADMINISTRATION,VIRTUALIZATION
This is post #16 in my December 2013 series about Linux Virtual MachinePerformance Tuning. For more, please see the tag “Linux
VM Performance Tuning.”
In previous posts on vm.swappiness and using
RAM disks wetalked about how the memory on a Linux guest is used for the OS itself (thekernel, buffers, etc.), applications, and also for file cache.
File caching is an important performance improvement, and read caching is a clear win in most cases, balanced againstapplications using the RAM directly. Write caching is
trickier(棘手). The Linux kernel
stages(使…分阶段的)disk writes into cache, and over timeasynchronously flushes them to disk. This has a
nice effect of speeding diskI/O but it is risky. When data isn’t written to disk there is an increasedchance of losing it.
There is also the chance that a lot of I/O will
overwhelm(压倒) the cache, too. Ever written a lot ofdata to disk all at once, and seen large pauses on the system while it tries todeal with all that data? Those
pauses are a result of the cache deciding thatthere’s too much data to be written asynchronously (as a non-blockingbackground operation, letting the application process continue), and switchesto writing synchronously (blocking and making the process wait until
the I/O iscommitted to disk). Of course, a filesystem also has to preserve writeorder, so when it starts writing synchronously it first has to destage thecache. Hence the long pause.
The nice thing is that these are controllable options, and based on yourworkloads & data you can decide how you want to set them up. Let’s take alook:
$ sysctl -a | grep dirty vm.dirty_background_ratio = 10vm.dirty_background_bytes = 0 vm.dirty_ratio = 20 vm.dirty_bytes = 0vm.dirty_writeback_centisecs = 500 vm.dirty_expire_centisecs = 3000
vm.dirty_background_ratio is thepercentage of system memory that can be filled with “dirty” pages —
memory pages that still need to be written to disk —
before the pdflush/flush/kdmflush background processes kick in to write itto disk. My example is 10%, so if my virtual server has 32 GB of memory that’s3.2 GB of data that can be sitting in RAM
before something is done.
vm.dirty_ratio is the absolute maximum amount ofsystem memory that can be filled with dirty pages before everything must getcommitted to disk.
When the system gets to this point all newI/O
blocks until dirty pages havebeen written to disk. This is often the source of long I/Opauses, but is a safeguard against too much data being cached unsafely inmemory.
vm.dirty_background_bytes and vm.dirty_bytes are another way to specify theseparameters.
If you set the _bytes version the _ratio version will become 0, andvice-versa.
vm.dirty_expire_centisecs is
how long something canbe in cache before it needs to be written. In this case it’s 30 seconds.
When the pdflush/flush/kdmflush processes kick in they will check to seehow old a dirty page is, and if it’s older than this value it’ll be writtenasynchronously to disk. Since holding a dirty
page in memoryis unsafe this is also a safeguard against data loss.
vm.dirty_writeback_centisecs is
how often thepdflush/flush/kdmflush processes wake up and check to see if work needs to bedone.
You can also see statistics on the page cache in /proc/vmstat:
$ cat /proc/vmstat | egrep "dirty|writeback"
nr_dirty 878 nr_writeback 0 nr_writeback_temp 0
In my case I have 878 dirty pages waiting to be written to disk.
Approach 1: Decreasing the Cache
As with most things in the computer world, how you adjust these depends onwhat you’re trying to do. In many cases we have fast disk subsystems with theirown big, battery-backed NVRAM caches, so keeping things in the
OS page cache isrisky. Let’s try to send I/O to the array in a more timely fashion and reducethe chance our local OS will, to borrow a
phrase(短语) from theservice industry, be “in
the weeds(杂草).”
To do this we lowervm.dirty_background_ratio and vm.dirty_ratio by adding new numbers to/etc/sysctl.conf and reloading with “sysctl –p”:
vm.dirty_background_ratio = 5 vm.dirty_ratio = 10
This is a typical approach on virtual machines, as well as Linux-based
hypervisors(虚拟机管理程序). Iwouldn’t suggest setting these parameters to zero, as some background I/O isnice to decouple application performance from short periods of
higher latencyon your disk array & SAN (“spikes”).
Approach 2: Increasing the Cache
There are scenarios where raising the cache dramatically has positiveeffects on performance. These situations are where the data contained on aLinux guest isn’t critical and can be lost, and usually where an application
iswriting to the same files repeatedly or in repeatable bursts(连发). In theory, by allowing more dirtypages to exist in memory you’ll rewrite the same blocks over
and over in cache,and just need to do one write every so often to the actual disk. To do this weraise the parameters:
vm.dirty_background_ratio = 50 vm.dirty_ratio = 80
Sometimes folks also increase the
vm.dirty_expire_centisecs parameter to allow more time in cache. Beyond the increased risk of dataloss, you also run the risk of long I/O pauses if that cache gets full andneeds to destage, because
on large VMs there will be a lot of data in cache.
Approach 3: Both Ways
There are also scenarios where a system has to deal with infrequent,bursty traffic to slow disk (batch jobs at the top of the hour, midnight,writing to an SD card on a Raspberry Pi, etc.). In that case an approach
mightbe to allow all that write I/O to be deposited in the cache so that thebackground flush operations can deal with it asynchronously over time:
vm.dirty_background_ratio = 5 vm.dirty_ratio = 80
Here the background processes will start writing right away when it hitsthat 5% ceiling but the system won’t force synchronous I/O until it gets to 80%full. From there you just size your system RAM and vm.dirty_ratio
to be able toconsume all the written data. Again, there are tradeoffs with data consistencyon disk, which translates into risk to data. Buy a UPS and make sure you candestage cache before the UPS runs out of power. :)
No matter the route you choose you should always be gathering hard data tosupport your changes and help you determine if you are improving things ormaking them worse. In this case you can get data from many different
places,including the application itself, /proc/vmstat, /proc/meminfo, iostat, vmstat,and many of the things in /proc/sys/vm. Good luck!
相关文章推荐
- Linux 下 i2c switch(选路芯片mux) — pca9548
- 安装Red Hat Enterprise Linux 6.6
- linux下mysql的root密码重置,不适用集成安装包
- linux常用命令
- Linux下的一些基础命令
- Linux ssh 不需要输入密码的方法
- Linux下根目录下文件挂载机制
- Linux虚拟文件系统VFS
- LINUX 安全运维 (六)
- 解决yum安装软件报Couldn't resolve host 'mirrorlist.centos.org问题
- linux 程序调试
- 看我linux(ubuntu)下的64位编译器编译32位程序
- 源码分析:动态分析 Linux 内核函数调用关系
- linux下的系统调用函数到内核函数的追踪
- CentOS学习日记:PostgreSQL篇
- Linux_FTP服务器
- #Linux学习笔记# Linux系统查看文件内容的命令
- 在Linux下创建分区和文件系统的方法详解
- Linux Is Not Matrix——repo浅解
- Linux串口驱动(8250)的编写与调试