您的位置：首页 > 运维架构 > Linux

性能分析工具

2009-07-23 09:30 323 查看

CPU性能分析工具：

vmstat

ps

sar

time

c

pstree

c

Memory性能分析工具：

vmstat

strace

top

ipcs

ipcrm

cat
/proc/meminfo

cat /proc/slabinfo

cat /proc/
/maps

I/O性能分析工具：

vmstat

ipstat

repquota

quotacheck

Network性能分析工具：

ifconfig

ethereal

tethereal

iptraf

iwconfig

nfsstat

mrtg

ntop

netstat

cat
/proc/sys/net

Linux
性能调优工具

当通过上述工具及命令，我们发现了应用的性能瓶颈以后，我们可以通过以下工具或者命令来进行性能的调整。

CPU性能调优工具：

nice
/
renic

sysctl

Memory性能调优工具：

swapon

ulimit

sysctl

I/O性能调优工具：

edquota

quoton

sysctl

boot
line:

elevator=

Network性能调优工具：

ifconfig

iwconfig

sysctl

CPU性能调整

当一个系统的CPU空闲时间或者等待时间小于5%时，我们就可以认为系统的CPU资源耗尽，我们应该对CPU进行性能调整。

CPU性能调整方法：

编辑/proc/sys/kernel/中的文件，修改内核参数。

#cd
/proc/sys/kernel/

# ls /proc/sys/kernel/

acct hotplug panic
real-root-dev

cad_pid modprobe panic_on_oops sem

cap-bound msgmax
pid_max shmall

core_pattern msgmnb powersave-nap
shmmax

core_uses_pid msgmni print-fatal-signals
shmmni

ctrl-alt-del ngroups_max printk suid_dumpable

domainname
osrelease printk_ratelimit sysrq

exec-shield ostype
printk_ratelimit_burst tainted

exec-shield-randomize overflowgid pty
threads-max

hostname overflowuid random
version

一般可能需要编辑的是pid_max和threads-max，如下：

# sysctl
kernel.threads-max

kernel.threads-max = 8192

# sysctl
kernel.threads-max=10000

kernel.threads-max =
10000

Memory性能调整

当一个应用系统的内存资源出现下面的情况时，我们认为需要进行Memory性能调整：

页面频繁换进换出；

缺少非活动页。

例如在使用vmstat命令时发现，memory的cache使用率非常低，而swap的si或者so则有比较高的数据值时，应该警惕内存的性能问题。

Memory性能调整方法：

1。关闭非核心的服务进程。

相关的方法请见CPU性能调整部分。

2。修改/proc/sys/vm/下的系统参数。

#
ls /proc/sys/vm/

block_dump laptop_mode
nr_pdflush_threads

dirty_background_ratio legacy_va_layout
overcommit_memory

dirty_expire_centisecs lower_zone_protection
overcommit_ratio

dirty_ratio max_map_count
page-cluster

dirty_writeback_centisecs min_free_kbytes
swappiness

hugetlb_shm_group nr_hugepages vfs_cache_pressure

#
sysctl vm.min_free_kbytes

vm.min_free_kbytes = 1024

# sysctl -w
vm.min_free_kbytes=2508

vm.min_free_kbytes = 2508

# cat
/etc/sysctl.conf

…

vm.min_free_kbytes=2058

…

3。配置系统的swap交换分区等于或者2倍于物理内存。

#
free

total used free shared buffers cached

Mem: 987656 970240
17416 0 63324 742400

-/+ buffers/cache: 164516 823140

Swap:
1998840 150272
1848568

I/O性能调整

系统出现以下情况时，我们认为该系统存在I/O性能问题：

系统等待I/O的时间超过50%；

一个设备的平均队列长度大于5。

我们可以通过诸如vmstat等命令，查看CPU的wa等待时间，以得到系统是否存在I/O性能问题的准确信息。

I/O性能调整方法：

1。修改I/O调度算法。

Linux已知的I/O调试算法有4种：

deadline
- Deadline I/O scheduler

as - Anticipatory I/O scheduler

cfq -
Complete Fair Queuing scheduler

noop - Noop I/O
scheduler

可以编辑/etc/yaboot.conf文件修改参数elevator得到。

# vi
/etc/yaboot.conf

image=/vmlinuz-2.6.9-11.EL

label=linux

read-only

initrd=/initrd-2.6.9-11.EL.img

root=/dev/VolGroup00/LogVol00

append="elevator=cfq
rhgb
quiet"

2。文件系统调整。

对于文件系统的调整，有几个公认的准则：

将I/O负载相对平均的分配到所有可用的磁盘上；

选择合适的文件系统，Linux内核支持reiserfs、ext2、ext3、jfs、xfs等文件系统；

#
mkfs -t reiserfs -j /dev/sdc1

文件系统即使在建立后，本身也可以通过命令调优；

tune2fs
(ext2/ext3)

reiserfstune (reiserfs)

jfs_tune
(jfs)

3。文件系统Mount时可加入选项noatime、nodiratime。

# vi
/etc/fstab

…

/dev/sdb1 /backup reiserfs acl, user_xattr, noatime,
nodiratime 1 1

4。调整块设备的READAHEAD，调大RA值。

[root@overflowuid ~]#
blockdev --report

RO RA SSZ BSZ StartSec Size Device

…

rw
256 512 4096 0 71096640 /dev/sdb

rw 256 512 4096 32 71094240
/dev/sdb1

[root@overflowuid ~]# blockdev --setra 2048
/dev/sdb1

[root@overflowuid ~]# blockdev --report

RO RA SSZ BSZ
StartSec Size Device

…

rw 2048 512 4096 0 71096640
/dev/sdb

rw 2048 512 4096 32 71094240
/dev/sdb1

Network性能调整

一个应用系统出现如下情况时，我们认为该系统存在网络性能问题：

网络接口的吞吐量小于期望值；

出现大量的丢包现象；

出现大量的冲突现象。

Network性能调整方法：

1。调整网卡的参数。

#
ethtool eth0

Settings for eth0:

Supported ports: [ TP
]

Supported link modes: 10baseT/Half 10baseT/Full

100baseT/Half
100baseT/Full

1000baseT/Full

Supports auto-negotiation:
Yes

Advertised link modes: 10baseT/Half 10baseT/Full

100baseT/Half
100baseT/Full

1000baseT/Full

Advertised auto-negotiation:
Yes

Speed: 100Mb/s

Duplex: Half

Port: Twisted
Pair

PHYAD: 0

Transceiver: internal

Auto-negotiation:
on

Supports Wake-on: d

Wake-on: d

Current message level:
0x00000007 (7)

Link detected: yes

#ethtool -s eth0 duplex
full

#ifconfig eth0 mtu 9000 up

2。增加网络缓冲区和包的队列。

# cat
/proc/sys/net/ipv4/tcp_mem

196608 262144 393216

# cat
/proc/sys/net/core/rmem_default

135168

# cat
/proc/sys/net/core/rmem_max

131071

# cat
/proc/sys/net/core/wmem_default

135168

# cat
/proc/sys/net/core/wmem_max

131071

# cat
/proc/sys/net/core/optmem_max

20480

# cat
/proc/sys/net/core/netdev_max_backlog

300

# sysctl
net.core.rmem_max

net.core.rmem_max = 131071

# sysctl -w
net.core.rmem_max=135168

net.core.rmem_max =
135168

3。调整Webserving。

# sysctl
net.ipv4.tcp_tw_reuse

net.ipv4.tcp_tw_reuse = 0

# sysctl -w
net.ipv4.tcp_tw_reuse=1

net.ipv4.tcp_tw_reuse = 1

# sysctl
net.ipv4.tcp_tw_recycle

net.ipv4.tcp_tw_recycle = 0

# sysctl -w
net.ipv4.tcp_tw_recycle=1

net.ipv4.tcp_tw_recycle = 1

1,CPU使用情况分析

vmstat
命令获得汇总信息.
有两个参数:1,每行输出需监视系统的秒数.2,提供的报告数.如果没有提供指定报告的行数,vmstat会一直运行.直到按下<control+c>时为止.
vmstat返回的第一行数据提供了自系统引导起来以后的平均值.随后的个行是再上一个采样期内的平均值.默认的采用时间为5秒.

$
vmstat 5 5
procs memory swap io system cpu
r b swpd free buff cache si so
bi bo in cs us sy id wa
1 1 4 172944 248936 5954084 0 0 53 33 86 206 7 5 85
3
0 1 4 172932 248936 5954140 0 0 1561 50 1135 5024 4 6 45 45
2 1 4 172844
248936 5954296 0 0 2132 144 1465 6612 26 8 33 33
1 1 4 172716 248936 5954704
0 0 3140 194 1845 9565 26 8 36 30
1 1 4 172364 248936 5955512 0 0 1963 1062
1329 6709 24 8 29 39

r
b

swpd
free 空闲内存
buff
（缓冲器，）
cache 缓存使用内存

si
so

io
bo

cs
每个时间段上下文切换的次数，也就是由内核切换当前运行进程的次数
in
每隔时间间隔内的中断数。cs或in的数值极高一般标识应将设备或运行有错误

us 用户时间数值较大表示计算机处于运算状态
sy 系统时间
数值较大表示进程正在做大量系统调用或执行I/O操作
id 空闲时间
一种粗劣规则是系统中50%的非空闲时间将用于用户空间,而另外50%用于系统时间;同时总统的空闲时间百分
比不应该为0.

Mpstat用于调试SMP(sysmetric
multiprocessing.对称多处理器).-P参数可以指定一个要给出报告的特定处理器.
[tapeback@xlback bin]$ mpstat
1 5
Linux 2.6.9-22.ELsmp (xlback.rrl.com) 09/20/2006

05:45:16 PM CPU
%user %nice %system %iowait %irq %soft %idle intr/s
05:45:17 PM all 0.00 0.00
0.00 0.00 0.00 0.00 100.00 1029.00
05:45:18 PM all 0.50 0.00 0.00 0.00 0.00
0.00 99.50 1032.00
05:45:19 PM all 4.02 0.00 0.00 0.00 0.00 0.00 95.98
1009.80
05:45:20 PM all 0.00 0.00 0.00 0.00 0.00 0.00 100.00
1017.00
05:45:21 PM all 0.00 0.00 0.00 0.00 0.00 0.00 100.00
1019.00
Average: all 0.90 0.00 0.00 0.00 0.00 0.00 99.10
1021.31

uptime命令获得负载的平均值.平均负载包括等待磁盘核网络I/O的进程,它并不是CPU使用情况的纯粹指标.
%
uptime
14:05:05 up 112 days, 22:37, 5 users, load average: 1.84, 1.81,
1.33
给出的3个数值分别对应系统在5分钟,10分钟和15分钟的平均负载,
Linux系统在平均负载达到3的时候就处于繁忙状态.而且不能很好地处理平均负载超过6的情况.

[tapeback@xlback
bin]$ ps -aux
Warning: bad syntax, perhaps a bogus '-'? See
/usr/share/doc/procps-3.2.3/FAQ
USER PID %CPU %MEM VSZ RSS TTY STAT START
TIME COMMAND
root 1 0.0 0.0 4748 548 ? S Aug22 0:00 init [2]
root 2 0.0
0.0 0 0 ? S Aug22 0:00 [migration/0]
root 3 0.0 0.0 0 0 ? SN Aug22 0:00
[ksoftirqd/0]
root 4 0.0 0.0 0 0 ? S Aug22 0:00 [migration/1]
root 5 0.0
0.0 0 0 ? SN Aug22 0:00 [ksoftirqd/1]
root 6 0.0 0.0 0 0 ? S< Aug22 0:00
[events/0]
root 7 0.0 0.0 0 0 ? S< Aug22 0:00 [events/1]
root 8 0.0 0.0
0 0 ? S< Aug22 0:00 [khelper]
root 9 0.0 0.0 0 0 ? S< Aug22 0:00
[kacpid]
root 42 0.0 0.0 0 0 ? S< Aug22 0:00 [kblockd/0]
root 43 0.0
0.0 0 0 ? S< Aug22 0:00 [kblockd/1]
root 59 0.0 0.0 0 0 ? S< Aug22 0:00
[aio/0]
root 60 0.0 0.0 0 0 ? S< Aug22 0:00 [aio/1]
root 44 0.0 0.0 0 0
? S Aug22 0:00 [khubd]
root 58 0.0 0.0 0 0 ? S Aug22 0:03 [kswapd0]
root
133 0.0 0.0 0 0 ? S Aug22 0:00 [kseriod]
root 204 0.0 0.0 0 0 ? S Aug22 0:00
[scsi_eh_0]
root 217 0.0 0.0 0 0 ? S Aug22 0:00 [scsi_eh_1]
root 218 0.0
0.0 0 0 ? S Aug22 0:00 [ahc_dv_0]
root 242 0.0 0.0 0 0 ? S Aug22 0:00
[scsi_eh_2]
root 243 0.0 0.0 0 0 ? S Aug22 0:00 [ahc_dv_1]
root 250 0.0
0.0 0 0 ? S Aug22 0:06 [kjournald]

USER 进程属主的用户名
PID 进程ID
%CPU
该进程正在使用的CPU时间百分数
%MEM 该进程正在使用的实际内存的百分数
VSZ 进程的虚拟大小
RSS
驻留集的大小（内存中页的数量）
TTY 控制终端的ID
STAT 当前进程的状态：
R＝可运行
D＝在等待磁盘（或者短期等待）
S＝在睡眠（<20秒）
T＝被跟踪或者被停止
Z＝僵进程
附加标志：
W＝进程被交换出去
<=进程拥有比普通优先级更高的优先级
N=进程拥有比普通优先级更低的优先级
L=有些页面被缩在内存中
START
启动进程的时间
TIME 进程已经消耗掉的CPU时间
COMMAND 命令的名称和参数

[tapeback@xlback
bin]$ top
top - 17:47:49 up 29 days, 17:27, 1 user, load average: 0.17, 0.43,
0.36
Tasks: 92 total, 1 running, 85 sleeping, 0 stopped, 6 zombie
Cpu(s):
0.0% us, 0.2% sy, 0.0% ni, 99.8% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 2056208k
total, 1977416k used, 78792k free, 60544k buffers
Swap: 2097144k total,
668544k used, 1428600k free, 1597712k cached

PID USER PR NI VIRT RES SHR
S %CPU %MEM TIME+ COMMAND
19807 tapeback 16 0 6144 1000 768 R 0.3 0.0
0:00.07 top
1 root 16 0 4748 548 456 S 0.0 0.0 0:00.53 init
2 root RT 0
0 0 0 S 0.0 0.0 0:00.31 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.00
ksoftirqd/0
4 root RT 0 0 0 0 S 0.0 0.0 0:00.50 migration/1
5 root 34 19
0 0 0 S 0.0 0.0 0:00.02 ksoftirqd/1
6 root 5 -10 0 0 0 S 0.0 0.0 0:00.03
events/0
7 root 5 -10 0 0 0 S 0.0 0.0 0:00.04 events/1
8 root 8 -10 0 0
0 S 0.0 0.0 0:00.00 khelper

第一行的项目依次为当前时间、系统启动时间、当前系统登录用户数目、平均负载。

第二行为进程情况，依次为进程总数、运行进程数、休眠进程数、僵死进程数、终止进程数。

第三行为CPU状态，依次为用户占用、系统占用、优先任务占用、闲置任务占用。

第四行为内存状态，依次为平均可用内存、已用内存、空闲内存、缓存使用内存。

第五行为交换状态，依次为平均可用交换容量、已用容量、闲置容量、交换高速缓存容量。

PID 进程ID
USER
进程属主的用户名
PR
NI
VIRT
RES
SHR
S
%CPU
该进程正在使用的CPU时间百分数
%MEM 该进程正在使用的实际内存的百分数
TIME 进程已经消耗掉的CPU时间
COMMAND
命令的名称和参数

sar的语法如下：
sar [-options] [interval [count]]

其中，internal是两次采样的间隔时间；count是指采样的次数；与CPU相关的options有：

参数的含义如下：

-c 表示输出采用的时间
-e hh:mm:ss 表示只显示CPU的信息
-i {irq |SUM|ALL|XALL}
相邻的两次采样的间隔时间
-P {cpu|ALL}
-q 显示在采样的时刻，可运行队列的任务的个数，以及系统平均负载
-u CPU
使用的情况，报告了cpu的用户态，系统态，等待I/O和空闲时间上的百分比。
-w：每秒上下文交换率
-o： filename 将结果放在文件里

-f： filename 表示从file文件中取出数据，如果没有指定-f file，则从标准数据文件

sar -c 2 -q 2 -u
-w
Linux 2.6.9-22.ELsmp (xlback.rrl.com) 09/20/2006

07:23:48 PM
proc/s
07:23:50 PM 0.00

07:23:48 PM cswch/s
07:23:50 PM
325.87

07:23:48 PM CPU %user %nice %system %iowait %idle
07:23:50 PM
all 0.00 0.25 0.00 0.00 99.75

07:23:48 PM runq-sz plist-sz ldavg-1
ldavg-5 ldavg-15
07:23:50 PM 0 113 0.05 0.20 0.14

与CPU有关的输出的含义

参数解释从/proc/stat获得数据
proc/s 在internal时间段里，每秒上下文切换率
processes/total*100
cswch 在internal时间段里，每秒上下文切换率 ctxt/total*100
intr/s
在internal时间段里，每秒CPU接收的中断的次数 idle/total*100

从/proc/loadavg获得数据

runq-sz 采样时，运行队列中任务的个数，不包括vmstat 进程。 procs_running-1
plist-sz
采样时，系统中活跃的任务的个数 nr_threads
ldavg-1 采样的前一秒钟系统的负载(%) lavg_1
ldavg-5
采样的5秒钟系统的负载(%) lavg_5
ldavg-15 采样的前15秒钟系统的负载(%) lavg_15

sar 1
10
Linux 2.6.9-22.ELsmp (xxx) 09/20/2006

06:54:52 PM CPU %user %nice
%system %iowait %idle
06:54:53 PM all 0.50 0.00 0.00 0.00 99.50
06:54:54
PM all 1.00 0.00 0.50 0.00 98.50
06:54:55 PM all 0.00 0.00 0.50 0.00
99.50
06:54:56 PM all 0.00 0.00 0.00 0.00 100.00
06:54:57 PM all 5.97 0.00
0.50 0.50 93.03
06:54:58 PM all 0.00 0.00 0.50 0.00 99.50

06:54:58 PM
CPU %user %nice %system %iowait %idle
06:54:59 PM all 0.50 0.00 0.00 0.00
99.50
06:55:00 PM all 1.00 0.00 0.00 0.00 99.00
06:55:01 PM all 0.00 0.00
0.00 0.00 100.00
06:55:02 PM all 0.00 0.00 0.00 0.00 100.00
Average: all
0.90 0.00 0.20 0.05 98.85

user 在internal时间段里，用户态的CPU时间（%），不包含 nice值为负进程
usr/total*100
nice 在internal时间段里，nice值为负进程的CPU时间（%） nice/total*100
sys
在internal时间段里，核心时间（%） (system+irq+softirq)/total*100
iowait
在internal时间段里，硬盘IO等待时间（%） iowait/total*100
idle
在internal时间段里，CPU除去等待磁盘IO操作外的因为任何原因而空闲的时间闲置时间（%） idle/total*100

2,linux的内存管理.
和unix一样,linux也是按照页的单元来管理内存的.目前在pc硬件上,页的大小为4kb.linux内核在进程需要内存的时候,分配给他们虚拟页,每
个虚拟页都被眏射到实际存储器上.既RAM或者磁盘上的交换空间.LINUX使用一个"页表"(pagetable)"来跟踪这些虚拟页同实际页之间的眏射
关系.Linux用交换空间(swapspace)来增加实际RAM的大小,有效地向进程提供它们所需要的内存.既然进程都以为他们的虚拟页眏射到了
实际的内存上,所以Linux总是忙于在RAM和交换区之间来回换页,这种活动称为调页(paging).

内存使用情况分析.
内存活动基本上可以用3个数字来量化:活动虚拟内存总量,交换(swapping)率和调页(paging)率.其中第一个数字表明内存的总需求量,后两个
数字表示那些内存中有多少比例正处在使用之中.目标是减少内存活动或增加内存量,直到调页率保持在一个可以接受的水平上为止.
使用free命令来判断当前投入使用的内存和交换的数量.带-t标志执行这条命令会自动计算出虚拟内存的总量.
free
-t
total used free shared buffers cached
Mem: 2056208 1977736 78472 0
60552 1598180
-/+ buffers/cache: 319004 1737204
Swap: 2097144 668544
1428600
Total: 4153352 2646280
1507072

Swapon命令来准确地判断出正在那些文件和分区作为交换空间.

Procinfo命令是把/proc下的文件已较好的格式显示出来.
Procinfo
-n5 能以5秒钟为间隔连续的属性输出结果.
Procinfo输出的信息有一些和free , uptime 和
vmstat
输出的信息重复了.此外,procinfo提供了有关内核版本,内存调页,磁盘访问以及IRQ分配的信息.可以使用procinfo
-a
看到/proc文件系统里的更多信息,其中包括内核的引导参数,内核的可以加载模块,字符设备和文件系统

磁盘I/O分析
使用iostat
命令监视磁盘的性能.
$ iostat
Linux 2.6.9-22.ELsmp (xxx) 09/20/2006

avg-cpu:
%user %nice %sys %iowait %idle
1.79 0.05 0.29 2.96 94.91

Device: tps
Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.61 1.12 15.81 2886882
40672114
sdb 60.69 643.80 267.79 1656348001 688974012
sdc 0.00 0.00 0.00
8527 0

user 在internal时间段里，用户态的CPU时间（%），不包含 nice值为负进程 usr/total*100

nice 在internal时间段里，nice值为负进程的CPU时间（%） nice/total*100
sys
在internal时间段里，核心时间（%） (system+irq+softirq)/total*100
iowait
在internal时间段里，硬盘IO等待时间（%） iowait/total*100
idle
在internal时间段里，CPU除去等待磁盘IO操作外的因为任何原因而空闲的时间闲置时间（%） idle/total*100

tps
每秒的I/O传输次数
Blk_read/s 每秒读取的块数
Blk_wrtn/s 每秒写入的块数
Blk_read
读取的总块数
Blk_wrtn 写入的总块数

总之:
Cpu的检查工具有:vmstat mpstat -p uptime ps
-aux
Memory free -t swapon -s procinfo top
磁盘 iostat

另: sar
工具

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： 工具磁盘 system user linux command

相关文章推荐

新的分享

章节导航