您的位置:首页 > 职场人生

HA的一个测试 推荐

2011-07-26 14:17 381 查看




之前配置完所,断心跳网卡后,应用不会切,一度以为是自己的配置有问题。但发现将vnet3切换成与网卡直接桥接,问题就解决了。这极有可能是因为vnet3两节点间,发送包有些问题。

前提部署:
1、环境配置
2、主机名,yum,ssh

1、安装heartbeat.
#yum install -y heartbeat*     #要执行两次哦,不然会发现有的包居然没有装上。

# rpm -qa | grep heartbeat*
heartbeat-gui-2.1.3-3.el5.centos
heartbeat-2.1.3-3.el5.centos
heartbeat-stonith-2.1.3-3.el5.centos
heartbeat-devel-2.1.3-3.el5.centos
heartbeat-ldirectord-2.1.3-3.el5.centos
heartbeat-pils-2.1.3-3.el5.centos

复制相关的配置文件:
# cp /usr/share/doc/heartbeat-2.1.3/ha.cf /etc/ha.d/     #ha.cf HA的配置文件
# cp /usr/share/doc/heartbeat-2.1.3/haresources /etc/ha.d/  #haresources 资源文件
# cp /usr/share/doc/heartbeat-2.1.3/authkeys /etc/ha.d/   #HA节点间的验证文件

# yum install -y httpd

# vim /etc/ha.d/ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility     local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
ucast eth1 1.1.1.2         #心跳
auto_failback on
node    ha1
node    ha2
ping 172.16.1.1  172.16.1.11 #网关与另一个节点IP
respawn hacluster /usr/lib/heartbeat/ipfail
deadping 30
apiauth ipfail uid=hacluster
use_logd yes
conn_logd_time 60

#cat authkeys         #定义认证的keys
auth 1
1 crc
================
heartbeat[8404]: 2011/07/26_05:02:48 ERROR: Bad permissions on keyfile [/etc/ha.d/authkeys], 600 recommended.
heartbeat[8404]: 2011/07/26_05:02:48 ERROR: Authentication configuration error.
heartbeat[8404]: 2011/07/26_05:02:48 ERROR: Configuration error, heartbeat not started.

# chmod 600 /etc/ha.d/authkeys
=================
# cat /etc/ha.d/haresources       #配置HA资源
ha1     IPaddr::172.16.1.100/24/eth0:0 httpd

# /etc/init.d/heartbeat start
logd is already running
Starting High-Availability services:
2011/07/26_05:05:15 INFO:  Resource is stopped
[  OK  ]

#ha1与ha2之间的配置,不同的就是ucast 值与 被ping的IP。
#++++++++++++++++++++++++++++++++++++++++++++++++++++++
#
#++++++++++++++++++++++++++++++++++++++++++++++++++++++
以下为断开心跳线,以及重新插入心跳线的过程日志:
#断开一方的心跳
heartbeat[7043]: 2011/07/26_13:53:40 WARN: node ha2.example.com: is dead
heartbeat[7043]: 2011/07/26_13:53:40 info: Dead node ha2.example.com gave up resources.
heartbeat[7043]: 2011/07/26_13:53:40 info: Link ha2.example.com:eth1 dead.
ipfail[7069]: 2011/07/26_13:53:40 info: Status update: Node ha2.example.com now has status dead
ipfail[7069]: 2011/07/26_13:53:42 info: NS: We are still alive!
ipfail[7069]: 2011/07/26_13:53:42 info: Link Status update: Link ha2.example.com/eth1 now has status dead
ipfail[7069]: 2011/07/26_13:53:44 info: Asking other side for ping node count.
ipfail[7069]: 2011/07/26_13:53:44 info: Checking remote count of ping nodes.
这个时候,请使用ip addr观察双方的IP地址,会发现VIP 地址出现在两台机器上。脑裂了!

#第二个节点又活了
heartbeat[7043]: 2011/07/26_13:56:09 CRIT: Cluster node ha2.example.com returning after partition.
heartbeat[7043]: 2011/07/26_13:56:09 info: For information on cluster partitions, See URL: http://linux-ha.org/SplitBrain
heartbeat[7043]: 2011/07/26_13:56:09 WARN: Deadtime value may be too small.
heartbeat[7043]: 2011/07/26_13:56:09 info: See FAQ for information on tuning deadtime.
heartbeat[7043]: 2011/07/26_13:56:09 info: URL: http://linux-ha.org/FAQ#heavy_load
heartbeat[7043]: 2011/07/26_13:56:09 info: Link ha2.example.com:eth1 up.
heartbeat[7043]: 2011/07/26_13:56:09 WARN: Late heartbeat: Node ha2.example.com: interval 104930 ms
ipfail[7069]: 2011/07/26_13:56:09 info: Link Status update: Link ha2.example.com/eth1 now has status up
heartbeat[7043]: 2011/07/26_13:56:09 info: Status update for node ha2.example.com: status active
ipfail[7069]: 2011/07/26_13:56:09 info: Status update: Node ha2.example.com now has status active
harc[7916]:     2011/07/26_13:56:09 info: Running /etc/ha.d/rc.d/status status
heartbeat[7043]: 2011/07/26_13:56:12 info: Heartbeat shutdown in progress. (7043)
#发现节点2的心跳网卡又活了,heartbeat重启了。
heartbeat[7932]: 2011/07/26_13:56:13 info: Giving up all HA resources.
ResourceManager[7945]:  2011/07/26_13:56:13 info: Releasing resource group: ha1.example.com IPaddr::172.16.1.100/24/eth0:0 httpd
ResourceManager[7945]:  2011/07/26_13:56:13 info: Running /etc/init.d/httpd  stop
#资源管理器关闭了之前的应用
ResourceManager[7945]:  2011/07/26_13:56:13 info: Running /etc/ha.d/resource.d/IPaddr 172.16.1.100/24/eth0:0 stop
IPaddr[8037]:   2011/07/26_13:56:13 INFO: ifconfig eth0:0 down
IPaddr[8008]:   2011/07/26_13:56:13 INFO:  Success
#相应的VIP也关了
ResourceManager[8067]:  2011/07/26_13:56:13 info: Releasing resource group: ha2.example.com IPaddr::172.16.1.101/24/eth0:1 vsftpd
#释放原属于ha2.example.com的ftp服务
ResourceManager[8067]:  2011/07/26_13:56:13 info: Running /etc/init.d/vsftpd  stop
ResourceManager[8067]:  2011/07/26_13:56:14 info: Running /etc/ha.d/resource.d/IPaddr 172.16.1.101/24/eth0:1 stop
IPaddr[8161]:   2011/07/26_13:56:14 INFO: ifconfig eth0:1 down
#停服务,停网卡。
IPaddr[8132]:   2011/07/26_13:56:14 INFO:  Success
heartbeat[7932]: 2011/07/26_13:56:14 info: All HA resources relinquished.
heartbeat[7043]: 2011/07/26_13:56:16 info: killing /usr/lib/heartbeat/ipfail process group 7069 with signal 15
heartbeat[7043]: 2011/07/26_13:56:17 info: Received shutdown notice from 'ha2.example.com'.
heartbeat[7043]: 2011/07/26_13:56:17 info: Resource takeover cancelled - shutdown in progress.
heartbeat[7043]: 2011/07/26_13:56:19 info: killing HBFIFO process 7045 with signal 15
heartbeat[7043]: 2011/07/26_13:56:19 info: killing HBWRITE process 7046 with signal 15
heartbeat[7043]: 2011/07/26_13:56:19 info: killing HBREAD process 7047 with signal 15
heartbeat[7043]: 2011/07/26_13:56:19 info: killing HBWRITE process 7048 with signal 15
heartbeat[7043]: 2011/07/26_13:56:19 info: killing HBREAD process 7049 with signal 15
heartbeat[7043]: 2011/07/26_13:56:19 info: Core process 7049 exited. 5 remaining
heartbeat[7043]: 2011/07/26_13:56:19 info: Core process 7047 exited. 4 remaining
heartbeat[7043]: 2011/07/26_13:56:19 info: Core process 7046 exited. 3 remaining
heartbeat[7043]: 2011/07/26_13:56:19 info: Core process 7048 exited. 2 remaining
heartbeat[7043]: 2011/07/26_13:56:19 info: Core process 7045 exited. 1 remaining
heartbeat[7043]: 2011/07/26_13:56:19 info: ha1.example.com Heartbeat shutdown complete.
#关了heartbeat服务
heartbeat[7043]: 2011/07/26_13:56:19 info: Heartbeat restart triggered.
heartbeat[7043]: 2011/07/26_13:56:19 info: Restarting heartbeat.
heartbeat[7043]: 2011/07/26_13:56:19 info: Performing heartbeat restart exec.
heartbeat[7043]: 2011/07/26_13:56:30 info: Version 2 support: false
heartbeat[7043]: 2011/07/26_13:56:30 WARN: Logging daemon is disabled --enabling logging daemon is recommended
heartbeat[7043]: 2011/07/26_13:56:30 info: **************************
heartbeat[7043]: 2011/07/26_13:56:30 info: Configuration validated. Starting heartbeat 2.1.3
heartbeat[8191]: 2011/07/26_13:56:30 info: heartbeat: version 2.1.3
heartbeat[8191]: 2011/07/26_13:56:30 info: Heartbeat generation: 1311635912
heartbeat[8191]: 2011/07/26_13:56:30 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1
heartbeat[8191]: 2011/07/26_13:56:30 info: glib: ucast: bound send socket to device: eth1
heartbeat[8191]: 2011/07/26_13:56:30 info: glib: ucast: bound receive socket to device: eth1
heartbeat[8191]: 2011/07/26_13:56:30 info: glib: ucast: started on port 694 interface eth1 to 10.1.1.2
heartbeat[8191]: 2011/07/26_13:56:30 info: glib: ping group heartbeat started.
heartbeat[8191]: 2011/07/26_13:56:30 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[8191]: 2011/07/26_13:56:30 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[8191]: 2011/07/26_13:56:30 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[8191]: 2011/07/26_13:56:30 info: Local status now set to: 'up'
heartbeat[8191]: 2011/07/26_13:56:32 info: Link group1:group1 up.
heartbeat[8191]: 2011/07/26_13:56:32 info: Status update for node group1: status ping
heartbeat[8191]: 2011/07/26_13:56:33 info: Link ha2.example.com:eth1 up.
heartbeat[8191]: 2011/07/26_13:56:33 info: Status update for node ha2.example.com: status up
harc[8199]:     2011/07/26_13:56:33 info: Running /etc/ha.d/rc.d/status status
heartbeat[8191]: 2011/07/26_13:56:33 info: Comm_now_up(): updating status to active
heartbeat[8191]: 2011/07/26_13:56:33 info: Local status now set to: 'active'
heartbeat[8191]: 2011/07/26_13:56:33 info: Starting child client "/usr/lib/heartbeat/ipfail" (498,496)
heartbeat[8216]: 2011/07/26_13:56:33 info: Starting "/usr/lib/heartbeat/ipfail" as uid 498  gid 496 (pid 8216)
heartbeat[8191]: 2011/07/26_13:56:34 info: Status update for node ha2.example.com: status active
harc[8219]:     2011/07/26_13:56:34 info: Running /etc/ha.d/rc.d/status status
ipfail[8216]: 2011/07/26_13:56:40 info: Status update: Node ha2.example.com now has status active
#检查另一个节点的状态
ipfail[8216]: 2011/07/26_13:56:43 info: Asking other side for ping node count.
ipfail[8216]: 2011/07/26_13:56:46 info: No giveup timer to abort.
heartbeat[8191]: 2011/07/26_13:56:50 info: local resource transition completed.
heartbeat[8191]: 2011/07/26_13:56:50 info: Initial resource acquisition complete (T_RESOURCES(us))
heartbeat[8191]: 2011/07/26_13:56:50 info: remote resource transition completed.
IPaddr[8271]:   2011/07/26_13:56:51 INFO:  Resource is stopped
heartbeat[8235]: 2011/07/26_13:56:51 info: Local Resource acquisition completed.
harc[8324]:     2011/07/26_13:56:51 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[8324]:  2011/07/26_13:56:51 received ip-request-resp IPaddr::172.16.1.100/24/eth0:0 OK yes
ResourceManager[8345]:  2011/07/26_13:56:51 info: Acquiring resource group: ha1.example.com IPaddr::172.16.1.100/24/eth0:0 httpd
IPaddr[8372]:   2011/07/26_13:56:52 INFO:  Resource is stopped
#获得资源信息
ResourceManager[8345]:  2011/07/26_13:56:53 info: Running /etc/ha.d/resource.d/IPaddr 172.16.1.100/24/eth0:0 start
IPaddr[8470]:   2011/07/26_13:56:54 INFO: Using calculated netmask for 172.16.1.100: 255.255.255.0
IPaddr[8470]:   2011/07/26_13:56:54 INFO: eval ifconfig eth0:0 172.16.1.100 netmask 255.255.255.0 broadcast 172.16.1.255
IPaddr[8441]:   2011/07/26_13:56:54 INFO:  Success
#取得VIP及ip地址
ResourceManager[8345]:  2011/07/26_13:56:54 info: Running /etc/init.d/httpd  start
服务正常了! 该日志为完整日志!

双心跳及HA个人理解综合 http://myhat.blog.51cto.com/391263/623546
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  职场 Ping 休闲 HA 心跳