您的位置:首页 > 运维架构 > Linux

菜鸟学Linux 第111篇笔记 Memory

2017-05-31 08:48 567 查看
菜鸟学Linux 第111篇笔记 Memory

建议查看原文(因为复制版的格式可能有问题)
原文出自 Winthcloud 链接:

内容总览
内存子系统组件
Memory提升
Viewing system calls
Strategies for using memory
Tunning page allocation
Tuning overcommit
Slab cache
ARP cache
Page cache
调优策略(redhat 6 performance tuning guide官方文档)
进程间通信相关的调优

内存子系统组件
slab allocator
buddy system
kswapd
pdflush
mmu

虚拟化环境
PA --> HA --> MA
虚拟机转换:PA-->HA

GuestOS, OS
Shadow PT

Memory提升
lbs

Hugetlbfs
查看是否启用Hugetlbfs
cat /proc/meminfo | grep Huge

启用大页面(永久有效)
/etc/sysctl.conf
添加 vm.nr_hugepages = n

即时启用
sysctl -w vm.nr_hugepages=n

Configure hugetlbfs if needed by application
创建hugepage并挂载
mkdir /hugepages
mount -t hugetlbfs none /hugepages

Viewing system calls
Trace every system call made by a program
strace -o /tmp/strace.out -p PID
grep mmap /tmp/strace.out

Summarize system calls
strace -c -p PID or
strace -c COMMAND

Strategies for using memory
Reduce overhead for tiny memory objects
Slab cache
cat /proc/slabinfo

Reduce or defer service time for slower subsystems
Filesystem metadata: buffer cache(slab cache)
Disk IO: page cache
Interprocess communications: shared memory
Network IO: buffer cache, arp cache, connection tracking

使用buffer cache 缓存文件元数据
使用page cache缓存Disk IO
使用shm完成进程间通信
使用buffer cache, arp cache和connection tracking提升网络IO性能

Considerations when tunning memory
How should pages be reclaimed to avoid pressure?
Larger writes are usually more efficient due to re-sorting

Tunning page allocation
Set using

vm.min_free_kbytes
Tuning vm.min_free_kbytes only be necessary when an application regularly
needs to allocate a large block of memory, then frees that same memory

It may well be the case that the system has too little disk bandwith, too
little CPU power, or too little memory to handle its load.

Consequences
Reduces service time for demand paging
Memory is not available for other useage
Can cause pressure on ZONE_NORMAL

内存耗尽会使系统崩溃

Tuning overcommit
Set using

cat /proc/sys/vm/overcommit_memory
vm.overcommit_memory
0 = heuristic overcommit
1 = always overcommit
2 = commit all RAM plus a percentage of swap (may be > 100)

vm.overcommit_ratio
specified the percentage of physical memory allowed to be
overcommited when the vm.overcommit_memory set to 2

View Committed_AS in /proc/meminfo
An estimate of how much RAM is required to avoid an out of memory (OOM)
condition for the current workload on a system

OOM
Overcommit Of Memory

Slab cache
Tiny kernel objects are stored in slab
Extra overhead of tracking is better than using 1 page/object
Example: filesystem metadata(dentry and inode caches)

Monitoring
/proc/slabinfo
slabtop
vmstat -m

Tuning a particular slab cache
echo "cache_name limit batchcount shared" > /proc/slabinfo
limit the maximum number of objects that will be cached for each CPU

batchcount the maximum number of global cache objects that will be
trasferred to the per-CPU cache when it becomes empty

shared the sharing behavior for Symmetric MultiProcessing(SMP) systems

ARP cache
ARP entries map hardware address to protocol address
cached in /proc/net/arp
By default, the cache is limited to 512 entries as a soft limit
and 1024 entries as a hard limit

Garbage collection removes stale or older entries

Insufficient ARP cache leads to
Intermittent timeouts between hosts
ARP thrashing

Too much ARP cache puts pressure on ZONE_NORMAL
List entries
ip neighbor list
Flush cache
ip neighbor flush dev ethX

Tuning ARP cache
Adjust where the gc will leave arp table alone
net.ipv4.neigh.default.gc_thresh1
default 128

Soft upper limit
net.ipv4.neigh.default.gc_thresh2
default 512
Becomes hard limit after 5 seconds

Hard upper limit
net.ipv4.neigh.default.gc_thresh3

Garbage collection frequency in seconds
net.ipv4.neigh.default.gc_interval

Page cache
A large percentage of paging activity is due to I/O requests
File reads: each page of file read from disk into memory
These pages form the page cache

Page cache is always checked for IO requests
Drectory reads
Reading and writing regular files
Reading and writing via block device files, DISK IO
Accessing memory mapped files, mmap
Accessing swapped out pages

Page in the page cache are associated with file data

Tuning page cache
View page cache allocation in /proc/meminfo
Tune length/size of memory
vm.lowmen_reserve_ratio
vm.vfs_cache_pressure

Tune arrival/completion rate
vm.page-cluster
vm.zone_reclaim_mode

vm.lowmen_reserve_ratio
For some specialised workloads on highmem machines it is dangerous for the
kernel to allow process memory to be allocated from the "lowmem" zone

Linux page allocator has a mechanism which prevents allocations which could
use highmem from using too much lowmem

The 'lowmem_reserve_ratio' tunable determines how aggressive the kernel is
in defending these lower zones

If you have a machine which uses highmem or ISA DMA and Your applications
are using mlock(), or if you are running with no swap then you probably
should change the lowmem_reserve_ratio setting

vfs_cache_pressure
Controls the tendency of the kernel to reclaim the memory which is used for
caching of directory and inode objects

At the default value of vfs_cache_pressure=100 the kernel will attempt to
reclaim dentries and inodes at a "fair" rate with respect to pagecache and
swapcache reclaim

Decreasing vfs_cache_pressure causes the kernel to prefer to retain dentry
and inode caches

When vfs_cache_pressure=0, the kernel will never reclaim dentries and
inodes due to memory pressure and this can easily lead to out-of-memory
conditions

Increasing vfs_cache_pressure beyond 100 causes the kernel to prefer to
reclaim dentries and inodes.

page-cluster
page-cluster controls the number of pages which are written to swap in
single attempt

It is a logarithmic value-setting it to zero means "1 page", setting it to
1 means "2 pages", setting it to 2 means "4 pages", etc

The default value is three (eight pages at a time)
There may be some small benefits in tuning this to a different value if
your workload is swap-intensive

zone_reclaim_mode
Zone_reclaim_mode allows someone to set more or less aggressive approaches
to reclaim memory when a zone runs out of memory

If it is set to zero then no zone reclaim occurs

Allocations will be satisfied from other zones/nodes in the system

This is value ORed together of
1 = Zone reclaim on
2 = Zone reclaim writes dirty pages out
4 = Zone reclaim swaps pages

Anonymous pages
Anonymous pages can be another large consumer of data

Are not associated with a file, but instead contain:
Program data - arrays, head allocations, etc
Anonymous memory regions
Dirty memory mapped process private pages
IPC shared memory regions pages

View summary usage
grep Anon /proc/meminfo
cat /proc/PID/statm
Anonymous pages = RSS - Shared

Anonymous pages are eligible for swap

调优策略
硬件调优: 硬件选型
软件调优: 内核调优 /proc, /sys
应用调优

内核调优
1. 进程管理,CPU
2. 内存调优
3. I/O 调优
4. 文件系统
5. 网络子系统

调优思路
1. 查看各项性能指标,定位瓶颈
2. 调优

红帽官方提供一份文档 redhat 6 performance tuning guide 可以搜索到

进程间通信相关的调优
ipcs (interprocess communication facilities)

进程间通信管理命令
ipcs
ipcrm

shared memory
kernel.shmmni
Specifies the maximum number of shared memory segments
system-wide, default = 4096

kernel.shmall
Specifies the total amount of shared memory, in pages, that
can be used at one time on the system, default=2097152

This should be at least kernel.shammax/PAGE_SIZE

kernel.shmmax
Specifies the maximum size of a shared memory segment that
can be created

messages
kernel.msgmnb
Specifies the maximum number of bytes in single message
queue, default = 16384

kernel.msgmni
Specifies the maximum number of message queue identifiers,
default=16

kernel.msgmax
Specifies the maximum size of a message that can be passed
between processes

This memory cannot be swapped, default=8192
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息