您的位置:首页 > 运维架构 > Shell

shell脚本实例-系统监控

2014-09-06 17:31 309 查看
shell脚本监控网站并实现邮件、短信报警
shell进程监控脚本(发送邮件报警)
Shell脚本监控服务器在线状态和邮件报警的方法

http://www.jbxue.com/jb/shell/

11.

#!/bin/bash
a="/tmp/test.log"
while [[ -f $a ]];
do
sleep 1;
size=`ls -lrt $a | cut -d " " -f 5`
echo $size
if [ $size -ge 1024 ];
then
logrotate  /etc/logrotate.conf;
fi
done

a="/mnt/fileserver/daq/check"
if [ ! -f $a ]
do
echo "nfs server is donw"|mail -v -s "nfs" createyuan@sohu.com createyuan1@163.com 403185951@qq.com
fi
 http://blog.csdn.net/gujing001/article/details/7110589  Shell处理字符串常用方法


10.

请输入要连接的主机

#!/bin/bash
#written by wubo
#blog:blog.csdn.net/wbls615117
while :
do
echo "请输入你要进行的操作:"
select var in "edit file" "view ip" "delete file" "change directory" "exit" "view directory"
do
break
done
case $var in
"edit file")
echo -n "please input edit file:"
read file
vim $file
echo '编辑文件成功'
;;
"view ip")
echo -n "please input device name:"
read file
ifconfig $file
echo '显示IP地址成功'
;;
"delete file")
echo -n "please input delete file:"
read file
rm -rf $file
echo '成功删除文件'
;;
"change directory")
echo -n "please input change directory:"
read file
cd $file
echo  "当前目录为:$(pwd)"
;;
"view directory")
echo -n "please input a directory:"
read file
ls $file
echo "目录浏览成功"
;;
"exit")
break
echo '退出成功'
;;
*)
break
echo '退出成功'
;;
esac
done


9.

#!/bin/bash
IP=`ifconfig eth0 | grep "inet addr" | cut -f 2 -d ":" | cut -f 1 -d " "`
tomcat_dir="/opt/apache-tomcat-7.0.8"
mysql_dir="/usr/local/mysql/bin/mysqld_safe"
vsftp_dir="/usr/sbin/vsftpd"
ssh_dir="/usr/sbin/sshd"
for dir in $tomcat_dir $mysql_dir $vsftp_dir  $ssh_dir
do
process_count=$(ps -ef | grep "$dir" | grep -v grep | wc -l)
for service in tomcat mysql vsftp ssh
do
echo "$dir" |grep -q "$service"
if [ $? -eq 0 ]
then
if [ $process_count -eq 0 ]
then
echo "$service is down at $(date +%Y%m%d%H:%M:%S)" >>/usr/monitor/process/process_$(date +%Y%m%d).log
echo "$service is down at $(date +%Y%m%d%H:%M:%S)" | mail -s "$IP服务器 $service服务关闭告警" XXXX@qq.com
else
echo "$service is running at $(date +%Y%m%d%H:%M:%S)" >>/usr/monitor/process/process_$(date +%Y%m%d).log
fi
else
continue
fi
done
done


8.监控日志特定内容

# cat cpu_bug_monitor.sh
#!/bin/bash
grep "BUG: soft lockup - CPU#" /var/log/messages
if [ $? -eq 0 ] ; then
counter=`grep "BUG: soft lockup - CPU#" /var/log/messages | wc -l `
echo "`date` ## CPU BUG: $counter times" | mutt -s "CPU BUG" 1397710****@139.com
echo "`date` ## CPU BUG: $counter times" >> /tmp/CPU_BUG_STATUS
else
echo "`date` ## Check CPU BUG normal" >> /tmp/CPU_BUG_STATUS
fi

一旦发现日志中出现"BUG: soft lockup - CPU#"将统计次数并发送到1397710****@139.com邮箱。如果正常就记录检查时间和结果到/tmp/CPU_BUG_STATUS。


7.base64bash实现

base64Table=(A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 + /);

function str2binary() {
idx=0;
for((i=0; i<${#str}; i++)); do
dividend=$(printf "%d" "'${str:i:1}");
for((j=0;j<8;j++)); do
let idx=8*i+7-j;
let bin[$idx]=$dividend%2;
dividend=$dividend/2;
done;
done;
let idx=${#str}*8;
for((i=0; i<appendEqualCnt*2; i++)); do
let bin[$idx]=0;
let idx++;
done;
}
function calcBase64() {
for((i=0; i<${#bin[*]}/6; i++)); do
sum=0;
for((j=0; j<6; j++)); do
let idx=i*6+j;
let n=6-1-j;
let sum=sum+${bin[$idx]}*2**n;
done;
echo -n ${base64Table[$sum]};
done
}

declare -a bin
function base64Encode() {
read -p "please enter ASCII string:" str;
let appendZero=${#str}*8%6;
let bits=${#str}*8;
appendEqualCnt=0;
if [[ $appendZero -ne 0 ]]; then
let appendEqualCnt=(6-$appendZero)/2;
fi
str2binary;
calcBase64;
if [[ $appendEqualCnt -eq 2 ]]; then
echo -n "==";
elif [[ $appendEqualCnt -eq 1 ]]; then
echo -n "=";
fi
echo;

}


6.颜色码表

[root@250-shiyan prog]# cat color
## the test text
T='samples'echo
echo "        default   40m       41m       42m      43m        44m       45m       46m      47m"
## FG 为前景(foreground)色, BG 为背景(background)色
for FGs in '    m' '   1m' '  30m' '1;30m' '  31m' '1;31m' '  32m' '1;32m' '  33m' '1;33m' '  34m' '1;34m' '  35m' '1;35m' '  36m' '1;36m' '  37m' '1;37m'
do
FG=$(echo $FGs|tr -d ' ')
echo -en " $FGs \033[$FG $T "
for BG in 40m 41m 42m 43m 44m 45m 46m 47m;
do
echo -en " \033[$FG\033[$BG $T \033[0m"
done
echo
done
echo


5.批量检测url地址是否可以访问的两种写法,for与while


训练点
1.错误时记录到文件中并且同时在控制台输出
2.从文件中循环读入参数,正确错误都记录到文件中,并且在错误时发邮件。

####输入1:
[root@250-shiyan prog]# cat web
#!/bin/bash
monitor_dir=/tmp/monitor/
if [ ! -d $monitor_dir ]
then
mkdir $monitor_dir
fi

cd $monitor_dir
web_stat_log=web.status

if [ ! -f $web_stat_log ]
then
touch $web_stat_log
fi

server_list_file=server.list

if [ ! -f $server_list_file ]
then
echo "`date '+%Y-%m-%d %H:%M:%S'` ERROR:$server_list_file NOT exists!" |tee -a $web_stat_log
exit 1
fi

for website in `cat $server_list_file`
do
url="http://$website"
server_status_code=`curl -o /dev/null -s -m 10 --connect-timeout 10 -w %{http_code} "$url"`
if [ "$server_status_code" = "200" ]
then
echo "`date '+%Y-%m-%d %H:%M:%S'` visit $website status code 200 OK" >>$web_stat_log
else
echo "`date '+%Y-%m-%d %H:%M:%S'` visit $website error!!! server can't connect at 10s or stop response at 10 s, send alerm sms ..." >>$web_stat_log
#  echo "!app alarm @136xxxxxxxx server:$website can't connect at 10s or stop response at 10s ..." | nc smsserver port &
fi
done

exit 0

####输入2:
[root@250-shiyan prog]# cat server.list
www.1.com
www.2.com
www.3.com
www.4.com
www.1.net
www.2.net
www.3.org
www.4.org
www.5.cn

####输出:
[root@250-shiyan prog]# cat /tmp/monitor/web.status
2015-02-10 14:50:00 ERROR:server.list NOT exists!
2015-02-10 14:50:10 visit www.1.com error!!! server can't connect at 10s or stop response at 10 s, send alerm sms ...
2015-02-10 14:50:11 visit www.2.com error!!! server can't connect at 10s or stop response at 10 s, send alerm sms ...
2015-02-10 14:50:12 visit www.3.com error!!! server can't connect at 10s or stop response at 10 s, send alerm sms ...
2015-02-10 14:50:12 visit www.4.com error!!! server can't connect at 10s or stop response at 10 s, send alerm sms ...
2015-02-10 14:50:14 visit www.1.net error!!! server can't connect at 10s or stop response at 10 s, send alerm sms ...
2015-02-10 14:50:20 visit www.2.net error!!! server can't connect at 10s or stop response at 10 s, send alerm sms ...
2015-02-10 14:50:21 visit www.3.org status code 200 OK
2015-02-10 14:50:22 visit www.4.org status code 200 OK
2015-02-10 14:50:28 visit www.5.cn status code 200 OK

####另外一种写法
####输入:
[root@250-shiyan prog]# cat web1
#!/bin/bash
while read URL
do
echo `date`
result=`curl -o /dev/null -s -m 10 --connect-timeout 10 -w %{http_code} $URL`
test=`echo $result`
if [[  "$test" = "200"  ]]
then
echo "$URL is ok"
else
echo "$URL is err"
#/usr/sbin/sendmail -t << EOF
#From:SD-Detect
#To:13918888888@139.com,13800000000@139.com
#Subject:Detected $URL
#------------------------------
#${URL} is err!!
#------------------------------
#EOF
fi
done < /root/sh/prog/server.list

[root@250-shiyan prog]# bash web1
Tue Feb 10 15:03:32 CST 2015
www.1.com is err
Tue Feb 10 15:03:33 CST 2015
www.2.com is err
Tue Feb 10 15:03:34 CST 2015
www.3.com is err
Tue Feb 10 15:03:40 CST 2015
www.4.com is err
Tue Feb 10 15:03:41 CST 2015
www.1.net is err
Tue Feb 10 15:03:41 CST 2015
www.2.net is err
Tue Feb 10 15:03:42 CST 2015
www.3.org is ok
Tue Feb 10 15:03:43 CST 2015
www.4.org is ok
Tue Feb 10 15:03:44 CST 2015
www.5.cn is ok


4.

训练点:
1.从格式化的输入文件中将字段分别分配给read的三个变量
2.awk的与或判断
3.如果错误日志有内容,则将错误内容定义为一个变量,然后传递给邮件函数和短信函数以便通知
4.首次执行时,因为没有Curl_Out.txt与Curl_Out_1.txt文件,会出错,所以第一次先注释掉,然后再打开,就会每次都将以前的内容删掉,重新记录。

####输入1:
[root@250-shiyan monitor]# cat aa
#!/bin/bash
smail() {
mail -s "$1" createyuan@sohu.com <<EOF
$1
$2
====
report time: `date +"%F %T"`
shell script: `echo $0`
current user: `whoami`
====
EOF
}

ssms() {
/usr/local/feixin/fetion --mobile=150000000 --pwd=******** --to=13810000000 --msg-gb="fx $1"
}

cd /tmp/monitor
File=server.list
#sed -i /.*/d Curl_Out.txt
#sed -i /.*/d Curl_Out_1.txt

sed -e '/^#/d;/^$/d' ${File} | while read Ip Port URL
do
/usr/bin/curl --connect-timeout 8 --max-time 12 -o /dev/null -s -w %{time_total}:%{size_download}:%{http_code} http://${URL} -x ${Ip}:${Port} >> Curl_Out.txt
echo ":${Ip}:${URL}" >> Curl_Out.txt
done

awk -F":" '{if(($1*1000<8000)&&($2>0)&&($3=="200"||$3=="301"||$3=="302"||$3=="401")) {} else {print $0 >> "Curl_Out_1.txt"}}' Curl_Out.txt

if [ -s Curl_Out_1.txt ]
then
Warning="`awk '{printf("%s\n",$0)}' Curl_Out_1.txt`"
#ssms ${Warning}
smail CURL_Monitor ${Warning}
fi
输入2:
[root@250-shiyan monitor]# cat server.list
192.168.2.2 80 192.168.2.2
192.168.2.84 8080 192.168.2.84/monitor
192.168.2.222 80 192.168.2.222
192.168.2.225 80 192.168.2.225
[root@250-shiyan monitor]# ls
aa  server.list

输出1:到文件中
[root@250-shiyan monitor]# bash aa
[root@250-shiyan monitor]# ls
aa  Curl_Out_1.txt  Curl_Out.txt  server.list
[root@250-shiyan monitor]# cat Curl_Out.txt
0.044:1576:200:192.168.2.2:192.168.2.2
0.004:0:302:192.168.2.84:192.168.2.84/monitor
0.050:1563:200:192.168.2.222:192.168.2.222
0.027:1550:200:192.168.2.225:192.168.2.225
[root@250-shiyan monitor]# cat Curl_Out_1.txt
0.004:0:302:192.168.2.84:192.168.2.84/monitor
输出2:到邮件中,以下是内容
CURL_Monitor
0.004:0:302:192.168.2.84:192.168.2.84/monitor
====
report time: 2015-02-10 15:58:43
shell script: aa
current user: root
====


3.

2.监控磁盘并发邮件

####第一步安装mail客户端,写邮件地址,写脚本
[root@250-shiyan ~]# vi disk
#!/bin/bash
yum install mail
mailaddr=createyuan1@126.com
smtpserver=smtp.126.com
user=createyuan1
passwd=*******
cat <<EOF >/etc/mail.rc
set from=$mailaddr
set smtp=$smtpserver
set smtp-auth=login
set smtp-auth-user=$user
set smtp-auth-password=$passwd
EOF

space=`df|sed -n '/\/$/p'|gawk '{print $5}'|sed 's/%//'`
if [ $space -ge 10 ]
then
echo "disk is $space" >/tmp/test
mail -v -s "testse" createyuan@sohu.com < /tmp/test
fi

####第二步加入计划任务中执行
[root@250-shiyan ~]# crontab -e
no crontab for root - using an empty one
crontab: installing new crontab
[root@250-shiyan ~]# crontab -l
1 * * * * bash /root/disk

[root@250-shiyan prog]# cat disk1
#!/bin/bash
while sleep 5
do
for i in `df -h |sed -n '/\/$/p'|awk '{print $5}'|sed 's/\%//g'`
do echo $i
if [ $i -ge 10 ]
then
echo "the disk is "
fi
done
done


1.样例

如果是139邮箱还可免费手机短信通知。
注:通过系统直接发送mail容易被拦截,可使用mail连接第三方smtp发送邮件。

#!/bin/bash
for URL in http://www.abc.com http://www.88888.cn
do
#获取http响应代码
HTTP_CODE=`curl -o /dev/null -s -w "%{http_code}" "${URL}"`
#服务器能正常响应,应该返回200的代码
if [ $HTTP_CODE = 200 ]
then
echo "$URL is OK" | /bin/mail -s "Http Check" qq@163.com
# else
# /usr/local/fetion/fetion --mobile=1356440xxxx --pwd 123456 --to=1885151xxxx --msg-utf8="$URL is ERROR; error code is $HTTP_CODE"
fi
done

主要是利用 curl -o /dev/null -s -m 10 --connect-timeout 10 -w %{http_code} "$url" 返回状态码是否200,如果10s没有返回200状态码,则发警报

-o 参数,是把下载的所有内容都重定向到/dev/null,-s命令,是屏蔽了curl本身的输出,而-w参数,是根据我们自己的需要,自定义了curl的输出格式。

使用这条命令,再配合邮件和短信,就可以实现对页面的可用性监控。将这个程序部署在全国各地的机器上,就可以对cdn网络进行可用性监控。

curl只返回服务器响应状态,不返回内容,返回200是正常的,其它的不正常,简单的命令如下:

[coomix@localhost ~]$ echo `curl -o /dev/null -s -m 10 --connect-timeout 10 -w %{http_code} "http://www.jbxue.com/index.php"`
200
[coomix@localhost ~]$ echo `curl -o /dev/null -s -m 10 --connect-timeout 10 -w %{http_code} "http://www.jbxue.com/index5.php"`
404
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: