您的位置：首页 > 运维架构 > Linux

CentOS Linux 监控安装之Nagios

2015-11-26 13:46 661 查看

CentOS Linux 监控安装之Nagios
1、Nagios介绍
Nagios是一款开源的免费网络监视工具，能有效监控Windows、Linux和Unix的主机状态，交换机路由器等网络设置。Nagios的功能是监控服务和主机，但是他自身并不包括这部分功能，所有的监控、检测功能都是通过各种插件来完成的。　　启动Nagios后，它会周期性的自动调用插件去检测服务器状态，同时Nagios会维持一个队列，所有插件返回来的状态信息都进入队列，Nagios每次都从队首开始读取信息，并进行处理后，把状态结果通过web显示出来。
Nagios提供了许多插件，利用这些插件可以方便的监控很多服务状态。安装完成后，在nagios主目录下的/libexec里放有nagios自带的可以使用的所有插件，如，check_disk是检查磁盘空间的插件，check_load是检查CPU负载的，等等。每一个插件可以通过运行./check_xxx –h 来查看其使用方法和功能。在监控远程主机的状态比如，磁盘、某个端口的服务，就需要使用到nrpe服务。NRPE 总共由两部分组成：（1）check_nrpe 插件，位于监控主机上；（2）NRPE daemon，运行在远程的Linux主机上(通常就是被监控机)Nagios定义了4种监控状态，代表不同的级别，除了OK代表正常不需要关心外，其他的都是需要关注的。
状态代码颜色
正常 OK 绿色
警告 WARNING ***
严重 CRITICAL 红色
未知错误 UNKOWN 深***

2、部署Nagios监控平台
安装前的准备工作：
1）、添加防火墙规则
vim /etc/sysconfig/iptables
-A INPUT -m state --state NEW -m tcp -p tcp--dport 80 -j ACCEPT #web访问查看监控
-A INPUT -m state --state NEW -m tcp -p tcp--dport 5666 -j ACCEPT #nrpe通信端口

:wq #保存退出
/etc/init.d/iptables restart #最后重启防火墙使配置生效

2）、关闭SELinux
vim /etc/selinux/config
#SELINUX=enforcing #注释掉
#SELINUXTYPE=targeted #注释掉
SELINUX=disabled #增加
:wq! #保存退出
setenforce 0 #使配置立即生效，重启系统永久生效

3）、监控环境说明：
类型操作系统 IP地址软件
监控服务端 CentOS 6.7 x86_64 192.168.17.10 Apache、php、nagios、nagios-plugins
监控客户端 CentOS 6.7 x86_64 192.168.17.20 nagios-plugins、nrpe
监控客户端 Windows 7 192.168.17.1 NSClient++

局域网内有两台主机，一台Linux、一台Windows，现在需要配置一台Nagios监控服务器对这两台主机进行监控。

以下是在Nagios监控的服务器（192.168.17.10）上操作：
1）、因为使用yum安装，需要用到epel的扩展源
yum install -y epel-release

2）、安装LAMP环境，使用yum安装（可不需要mysql，根据实际的环境部署，建议使用源码安装）
yum install -y httpd php php-mysql mysql mysql-servermysql-devel php-gd libjpeg libjpeg-devellibpng libpng-devel

3）、安装nagios相关的软件包（nagios插件、nrpe）
yum install -y nagios nagios-pluginsnagios-plugins-all nrpe nagios-plugins-nrpe

4）、设置用于访问nagios的访问控制（使用apache的htpasswd工具）
htpasswd -c /etc/nagios/passwd nagiosadmin #然后输入两次密码nagiosadmin

5）、重启服务
service httpd start; service nagios start

6）、使用浏览器访问http://ip/nagios（http://192.168.17.10/nagios）

另外，nagios的默认全局配置文件是 /etc/nagios/nagios.cfg ，在里面定义了一些模版文件，带#号表示没有启用
cfg_file=/etc/nagios/objects/commands.cfg #定义命令配置文件
cfg_file=/etc/nagios/objects/contacts.cfg #定义联系人和联系人组的配置文件
cfg_file=/etc/nagios/objects/timeperiods.cfg #定义Nagios 监控时间段的配置文件
cfg_file=/etc/nagios/objects/templates.cfg #定义主机和服务的一个模板配置文件

# Definitions for monitoring the local(Linux) host
cfg_file=/etc/nagios/objects/localhost.cfg #监控本机的配置文件

# Definitions for monitoring a Windowsmachine
#cfg_file=/etc/nagios/objects/windows.cfg #定义Windows的模版文件

# Definitions for monitoring arouter/switch
#cfg_file=/etc/nagios/objects/switch.cfg #定义交换机的模版文件

# Definitions for monitoring a networkprinter
#cfg_file=/etc/nagios/objects/printer.cfg #定义打印机的模版文件

验证nagios配置文件是否有误，可以使用如下命令：
nagios -v /etc/nagios/nagios.cfg

3、配置被监控的主机（监控）

1）、配置Linux客户端
Linux客户端上需要安装nagios等相关插件，同时需要开启防火墙TCP 5666端口
vim /etc/sysconfig/iptables #编辑防火墙配置
-A INPUT -m state --state NEW -m tcp -p tcp--dport 5666 -j ACCEPT
/etc/init.d/iptables restart #重启防火墙使配置生效

在linux 客户端上需要安装的软件有nagios-plugins nagios-plugins-all nrpe nagios-plugins-nrpe
（1）、安装nagios相关组件（192.168.171.20）
yum install -y nagios-plugins nagios-plugins-all nrpe nagios-plugins-nrpe

（2）、修改nrpe.cfg配置文件
vim /etc/nagios/nrpe.cfg
找到“allowed_hosts=127.0.0.1 ” 改为“allowed_hosts=127.0.0.1,192.168.17.10”
##即增加服务器的IP
找到“dont_blame_nrpe=0” 改为“dont_blame_nrpe=1”

2）、配置Windows客户端
Windows客户端需要安装NSClient++，下载地址是：http://www.nsclient.org/，下载软件后，直接点击运行安装即可。
在选择安装类型的时候，可以感觉自己的需要选择安装，我这里选择的是“典型”安装，在安装的过程中会有一个配置，主要有以下几点：
Allowed hosts:这是运行那些主机，在后面添加监控主机的ip（192.168.17.10）,这个也可以安装完之后修改配置文件。
password：用于通信的密码
Modules to load：这里是选择要加载的模块，根据实际选择，需要就勾选。

安装完成后，NSClient++会以服务的形式运行，可以使用命令：services.msc 打开服务查看NSClient++是否运行，它监听的端口是TCP 12489

4、在监控服务器上配置监控的客户机

1）、设置linux 客户端

（1）、在监控服务器上配置Linux主机（192.168.17.20）的监控，我们可以直接使用现在系统上有的模版修改，把配置文件存放到/etc/nagios/conf.d/目录，配置文件的名字可以使用主机类型+IP地址命名，比如linux192.168.17.20.cfg

修改如下：
vim/etc/nagios/conf.d/linux192.168.17.10.cfg
# Define a host for the 192.168.17.20machine

define host{
use linux-server
host_name 192.168.17.20
alias 17.20
address 192.168.17.20
}

# Define a service to "ping" thelocal machine

define service{
use local-service
host_name 192.168.17.20
service_description PING
check_command check_ping!100.0,20%!500.0,60%
max_check_attempts 5 #检查5次才报警
normal_check_interval 1 #重新检查时间，默认3分钟
}

# Define a service to check the disk spaceof the root partition
# on the local machine. Warning if < 20% free, critical if
# < 10% free space on partition.

define service{
use local-service
host_name 192.168.17.20
service_description Root Partition
check_command check_local_disk!20%!10%!/
max_check_attempts 5
normal_check_interval 1
}

# Define a service to check the number ofcurrently logged in
# users on the local machine. Warning if > 20 users, critical
# if > 50 users.

define service{
use local-service
host_name 192.168.17.20
service_description Current Users
check_command check_local_users!20!50
max_check_attempts 5
normal_check_interval 1
}

# Define a service to check the number ofcurrently running procs
# on the local machine. Warning if > 250 processes, critical if
# > 400 users.

define service{
use local-service
host_name 192.168.17.20
service_description Total Processes
check_command check_local_procs!250!400!RSZDT
max_check_attempts 5
normal_check_interval 1
}

# Define a service to check the load on thelocal machine.

define service{
use local-service
host_name 192.168.17.20
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
max_check_attempts 5
normal_check_interval 1
}

# Define a service to check the swap usagethe local machine.
# Critical if less than 10% of swap isfree, warning if less than 20% is free

define service{
use local-service
host_name 192.168.17.20
service_description Swap Usage
check_command check_local_swap!20!10
max_check_attempts 5
normal_check_interval 1
}

# Define a service to check SSH on thelocal machine.
# Disable notifications for this service bydefault, as not all users may have SSH enabled.

define service{
use local-service
host_name 192.168.17.20
service_description SSH
check_command check_ssh
notifications_enabled 0
max_check_attempts 5
normal_check_interval 1
}

# Define a service to check HTTP on thelocal machine.
# Disable notifications for this service bydefault, as not all users may have HTTP enabled.

define service{
use local-service
host_name 192.168.17.20
service_description HTTP
check_command check_http
notifications_enabled 0
max_check_attempts 5
normal_check_interval 1
}

在这定义的服务中，需要使用到nrpe检测客户机的状态的有检测磁盘（check_local_disk）、负载（check_local_load）等，需要在客户机上的配置文件（/etc/nagios/nrpe.cfg）上有定义这样的命令，如果没有，则需要自行编写。

（2）、自定义监控项目
在nagios中默认的模版是没有监控内存的，需要自行定义，以下就使用自定的方式通过NRPE来监控远程服务器上的内存使用率。

a、监控的客户机下操作
下载监控内存的脚本
cd /usr/lib64/nagios/plugins/ #请根据系统的版本进入响应的目录
wgethttps://raw.githubusercontent.com/justintime/nagios-plugins/master/check_mem/check_mem.pl#下载脚本
mv check_mem.pl check_mem
chmod +x check_mem

可以使用如下命令测试脚本是否可用
./check_mem -f -w 30 -c 20 #可用内存为30%就警告，20%就严重警告

b、在监控主机上操作
vim /etc/nagios/objects/commands.cfg #编辑nagios命令配置文件，在后面增加检查内存的命令

define command{
command_name check_nrpe
command_line /usr/lib64/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

另外这种写法也可以：
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe-H $HOSTADDRESS$ -c $ARG1$
}

然后在继续编辑之前监控的那台linux主机的配置文件：
vim/etc/nagios/conf.d/linux192.168.17.10.cfg #编辑配置文件，增加服务

define service{
use local-service
host_name 192.168.17.20
service_description Check RAM
check_command check_nrpe!check_mem
notifications_enabled 0
max_check_attempts 5
normal_check_interval 1
}

重启nagios服务
/etc/init.d/nagios restart

c、在监控客户机上操作
vim /etc/nagios/nrpe.cfg #增加check_men的命令
command[check_men]=/usr/lib64/nagios/plugins/check_mem–f -w 20 -c 10

重启nrpe服务
/etc/init.d/nrpe restart
在监控主机上也可以使用命令检查check_nrpe是否可以返回检查内存的状态：
/usr/lib64/nagios/plugins/check_nrpe -H192.168.17.20 -c check_mem
此时，在监控的控制台上可以看到刚刚配置的监控主机和服务。

2）、设置Windows客户端

在监控服务器上配置Windows主机（192.168.17.1）的监控，同样也是直接使用现在系统上有的模版修改（windows模版），把配置文件存放到/etc/nagios/conf.d/目录，配置文件的名字使用主机类型+IP地址命名，比如windows192.168.17.1.cfg，同时需要在/etc/nagios/nagios.cfg把windows.cfg的配置打开。
找到“#cfg_file=/etc/nagios/objects/windows.cfg”修改为：cfg_file=/etc/nagios/objects/windows.cfg

vim/etc/nagios/conf.d/windows192.168.17.1.cfg

define host{
use windows-server
host_name 192.168.17.1
alias My Windows Server
address 192.168.17.1
}

define service{
use generic-service
host_name 192.168.17.1
service_description NSClient++ Version
check_command check_nt!CLIENTVERSION
}

define service{
use generic-service
host_name 192.168.17.1
service_description Uptime
check_command check_nt!UPTIME
}

define service{
use generic-service
host_name 192.168.17.1
service_description CPU Load
check_command check_nt!CPULOAD!-l 5,80,90
}

define service{
use generic-service
host_name 192.168.17.1
service_description Memory Usage
check_command check_nt!MEMUSE!-w 80 -c 90
}

define service{
use generic-service
host_name 192.168.17.1
service_description C:\ Drive Space
check_command check_nt!USEDDISKSPACE!-l c -w 80-c 90
}

define service{
use generic-service
host_name 192.168.17.1
service_description W3SVC
check_command check_nt!SERVICESTATE!-d SHOWALL-l W3SVC
}

define service{
use generic-service
host_name 192.168.17.1
service_description Explorer
check_command check_nt!PROCSTATE!-d SHOWALL -lExplorer.exe
}

在这个模版里面，主要是修改host_name，address。

同时需要在/etc/nagios/objects/commands.cfg 配置文件里面修改配置。
找到：
define command{
command_name check_nt
command_line $USER1$/check_nt-H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
}

修改为：
define command{
command_name check_nt
command_line $USER1$/check_nt-H $HOSTADDRESS$ -p 12489 -s frAQBc8Wsa1xVPfv -v $ARG1$ $ARG2$
}
也就是增加：-s password，增加密码认证，这个密码是可以在客户端上修改的。

保存配置文件后，重启nagios服务。
/etc/init.d/nagios restart

此时在nagios控制台上会有刚刚添加的Windows监控客户端。

5、配置邮件告警功能

nagios是可以通过设定一个阀值发出警告的，可以使用这个功能达到发送邮件或者短信给管理员。

1）、测试本机是否安装sendmail服务，如果没有请安装，
yum install -y sendmail
/etc/init.d/sendmail start #启动sendmail服务

2）、测试发送邮件，格式：mail –s “主题” 邮箱地址
echo "from balich nagios server"| mail -s "from balich" balich@qq.com

3）、配置告警
编辑联系方式的配置文件，在后面增加配置
vim /etc/nagios/objects/contacts.cfg

define contact{
contact_name balich #联系名
use generic-contact
alias balich Admin
email balich@qq.com #邮箱地址
}

define contactgroup{
contactgroup_name balichs
alias balichAdministrators
members balich
}

然后在编辑需要报警的主机的配置文件，比如：linux192.168.17.20.cfg 这台主机，需要对某项的服务需要开启报警。

define service{
use local-service
host_name balich-ha2
service_description HTTP
check_command check_http
notifications_enabled 1 #是否开启提醒功能，1：提醒；0：禁用
notification_interval 5 max_check_attempts 5
normal_check_interval 1
contact_groups balichs #定义提醒联系人组
notification_period 24x7 #定义提醒时间
notification_ options w,u,c,r #d定义发送的告警的状态
}
notifications_enabled : 是否开启提醒功能。1为开启，0为禁用。一般，这个选项会在主配置文件（nagios.cfg）中定义，效果相同。
contact_groups: 定义接收提醒的联系人组
notification_interval:重复发送提醒信息的最短间隔时间。默认间隔时间是60分钟。如果这个值设置为0，将不会发送重复提醒。
notification_period: 发送提醒的时间段。非常重要的主机（服务）我定义为7×24，一般的主机（服务）就定义为上班时间。如果不在定义的时间段内，无论什么问题发生，都不会发送提醒。
notification_options: 这个参数定义了发送提醒包括的情况：d = 状态为DOWN, u = 状态为UNREACHABLE , r = 状态恢复为OK , f = flapping。，n=不发送提醒。

这里只是定义了web的服务，可以根据需要设置。
重启nagios服务，把web服务关闭，测试提醒功能。
/etc/init.d/nagios restart

然后测试，邮件报警功能是否可用。
邮件的报警内容：
***** Nagios *****

Notification Type: PROBLEM

Service: check_http
Host: 17.20
Address: 192.168.17.20
State: CRITICAL

Date/Time: Wed Oct 14 12:17:10 CST 2015

Additional Info:

connect to address 192.168.17.20 and port80: 拒绝连接

至此，nagios监控就安装完成了。

本文出自 “balich” 博客，请务必保留此出处http://balich.blog.51cto.com/6641781/1717058

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航