HA的一个测试 推荐
2011-07-26 14:17
381 查看
![](http://blog.51cto.com/attachment/201107/141216151.jpg)
之前配置完所,断心跳网卡后,应用不会切,一度以为是自己的配置有问题。但发现将vnet3切换成与网卡直接桥接,问题就解决了。这极有可能是因为vnet3两节点间,发送包有些问题。
前提部署: 1、环境配置 2、主机名,yum,ssh 1、安装heartbeat. #yum install -y heartbeat* #要执行两次哦,不然会发现有的包居然没有装上。 # rpm -qa | grep heartbeat* heartbeat-gui-2.1.3-3.el5.centos heartbeat-2.1.3-3.el5.centos heartbeat-stonith-2.1.3-3.el5.centos heartbeat-devel-2.1.3-3.el5.centos heartbeat-ldirectord-2.1.3-3.el5.centos heartbeat-pils-2.1.3-3.el5.centos 复制相关的配置文件: # cp /usr/share/doc/heartbeat-2.1.3/ha.cf /etc/ha.d/ #ha.cf HA的配置文件 # cp /usr/share/doc/heartbeat-2.1.3/haresources /etc/ha.d/ #haresources 资源文件 # cp /usr/share/doc/heartbeat-2.1.3/authkeys /etc/ha.d/ #HA节点间的验证文件 # yum install -y httpd # vim /etc/ha.d/ha.cf debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility local0 keepalive 2 deadtime 30 warntime 10 initdead 120 udpport 694 ucast eth1 1.1.1.2 #心跳 auto_failback on node ha1 node ha2 ping 172.16.1.1 172.16.1.11 #网关与另一个节点IP respawn hacluster /usr/lib/heartbeat/ipfail deadping 30 apiauth ipfail uid=hacluster use_logd yes conn_logd_time 60 #cat authkeys #定义认证的keys auth 1 1 crc ================ heartbeat[8404]: 2011/07/26_05:02:48 ERROR: Bad permissions on keyfile [/etc/ha.d/authkeys], 600 recommended. heartbeat[8404]: 2011/07/26_05:02:48 ERROR: Authentication configuration error. heartbeat[8404]: 2011/07/26_05:02:48 ERROR: Configuration error, heartbeat not started. # chmod 600 /etc/ha.d/authkeys ================= # cat /etc/ha.d/haresources #配置HA资源 ha1 IPaddr::172.16.1.100/24/eth0:0 httpd # /etc/init.d/heartbeat start logd is already running Starting High-Availability services: 2011/07/26_05:05:15 INFO: Resource is stopped [ OK ] #ha1与ha2之间的配置,不同的就是ucast 值与 被ping的IP。 #++++++++++++++++++++++++++++++++++++++++++++++++++++++ # #++++++++++++++++++++++++++++++++++++++++++++++++++++++ 以下为断开心跳线,以及重新插入心跳线的过程日志: #断开一方的心跳 heartbeat[7043]: 2011/07/26_13:53:40 WARN: node ha2.example.com: is dead heartbeat[7043]: 2011/07/26_13:53:40 info: Dead node ha2.example.com gave up resources. heartbeat[7043]: 2011/07/26_13:53:40 info: Link ha2.example.com:eth1 dead. ipfail[7069]: 2011/07/26_13:53:40 info: Status update: Node ha2.example.com now has status dead ipfail[7069]: 2011/07/26_13:53:42 info: NS: We are still alive! ipfail[7069]: 2011/07/26_13:53:42 info: Link Status update: Link ha2.example.com/eth1 now has status dead ipfail[7069]: 2011/07/26_13:53:44 info: Asking other side for ping node count. ipfail[7069]: 2011/07/26_13:53:44 info: Checking remote count of ping nodes. 这个时候,请使用ip addr观察双方的IP地址,会发现VIP 地址出现在两台机器上。脑裂了! #第二个节点又活了 heartbeat[7043]: 2011/07/26_13:56:09 CRIT: Cluster node ha2.example.com returning after partition. heartbeat[7043]: 2011/07/26_13:56:09 info: For information on cluster partitions, See URL: http://linux-ha.org/SplitBrain heartbeat[7043]: 2011/07/26_13:56:09 WARN: Deadtime value may be too small. heartbeat[7043]: 2011/07/26_13:56:09 info: See FAQ for information on tuning deadtime. heartbeat[7043]: 2011/07/26_13:56:09 info: URL: http://linux-ha.org/FAQ#heavy_load heartbeat[7043]: 2011/07/26_13:56:09 info: Link ha2.example.com:eth1 up. heartbeat[7043]: 2011/07/26_13:56:09 WARN: Late heartbeat: Node ha2.example.com: interval 104930 ms ipfail[7069]: 2011/07/26_13:56:09 info: Link Status update: Link ha2.example.com/eth1 now has status up heartbeat[7043]: 2011/07/26_13:56:09 info: Status update for node ha2.example.com: status active ipfail[7069]: 2011/07/26_13:56:09 info: Status update: Node ha2.example.com now has status active harc[7916]: 2011/07/26_13:56:09 info: Running /etc/ha.d/rc.d/status status heartbeat[7043]: 2011/07/26_13:56:12 info: Heartbeat shutdown in progress. (7043) #发现节点2的心跳网卡又活了,heartbeat重启了。 heartbeat[7932]: 2011/07/26_13:56:13 info: Giving up all HA resources. ResourceManager[7945]: 2011/07/26_13:56:13 info: Releasing resource group: ha1.example.com IPaddr::172.16.1.100/24/eth0:0 httpd ResourceManager[7945]: 2011/07/26_13:56:13 info: Running /etc/init.d/httpd stop #资源管理器关闭了之前的应用 ResourceManager[7945]: 2011/07/26_13:56:13 info: Running /etc/ha.d/resource.d/IPaddr 172.16.1.100/24/eth0:0 stop IPaddr[8037]: 2011/07/26_13:56:13 INFO: ifconfig eth0:0 down IPaddr[8008]: 2011/07/26_13:56:13 INFO: Success #相应的VIP也关了 ResourceManager[8067]: 2011/07/26_13:56:13 info: Releasing resource group: ha2.example.com IPaddr::172.16.1.101/24/eth0:1 vsftpd #释放原属于ha2.example.com的ftp服务 ResourceManager[8067]: 2011/07/26_13:56:13 info: Running /etc/init.d/vsftpd stop ResourceManager[8067]: 2011/07/26_13:56:14 info: Running /etc/ha.d/resource.d/IPaddr 172.16.1.101/24/eth0:1 stop IPaddr[8161]: 2011/07/26_13:56:14 INFO: ifconfig eth0:1 down #停服务,停网卡。 IPaddr[8132]: 2011/07/26_13:56:14 INFO: Success heartbeat[7932]: 2011/07/26_13:56:14 info: All HA resources relinquished. heartbeat[7043]: 2011/07/26_13:56:16 info: killing /usr/lib/heartbeat/ipfail process group 7069 with signal 15 heartbeat[7043]: 2011/07/26_13:56:17 info: Received shutdown notice from 'ha2.example.com'. heartbeat[7043]: 2011/07/26_13:56:17 info: Resource takeover cancelled - shutdown in progress. heartbeat[7043]: 2011/07/26_13:56:19 info: killing HBFIFO process 7045 with signal 15 heartbeat[7043]: 2011/07/26_13:56:19 info: killing HBWRITE process 7046 with signal 15 heartbeat[7043]: 2011/07/26_13:56:19 info: killing HBREAD process 7047 with signal 15 heartbeat[7043]: 2011/07/26_13:56:19 info: killing HBWRITE process 7048 with signal 15 heartbeat[7043]: 2011/07/26_13:56:19 info: killing HBREAD process 7049 with signal 15 heartbeat[7043]: 2011/07/26_13:56:19 info: Core process 7049 exited. 5 remaining heartbeat[7043]: 2011/07/26_13:56:19 info: Core process 7047 exited. 4 remaining heartbeat[7043]: 2011/07/26_13:56:19 info: Core process 7046 exited. 3 remaining heartbeat[7043]: 2011/07/26_13:56:19 info: Core process 7048 exited. 2 remaining heartbeat[7043]: 2011/07/26_13:56:19 info: Core process 7045 exited. 1 remaining heartbeat[7043]: 2011/07/26_13:56:19 info: ha1.example.com Heartbeat shutdown complete. #关了heartbeat服务 heartbeat[7043]: 2011/07/26_13:56:19 info: Heartbeat restart triggered. heartbeat[7043]: 2011/07/26_13:56:19 info: Restarting heartbeat. heartbeat[7043]: 2011/07/26_13:56:19 info: Performing heartbeat restart exec. heartbeat[7043]: 2011/07/26_13:56:30 info: Version 2 support: false heartbeat[7043]: 2011/07/26_13:56:30 WARN: Logging daemon is disabled --enabling logging daemon is recommended heartbeat[7043]: 2011/07/26_13:56:30 info: ************************** heartbeat[7043]: 2011/07/26_13:56:30 info: Configuration validated. Starting heartbeat 2.1.3 heartbeat[8191]: 2011/07/26_13:56:30 info: heartbeat: version 2.1.3 heartbeat[8191]: 2011/07/26_13:56:30 info: Heartbeat generation: 1311635912 heartbeat[8191]: 2011/07/26_13:56:30 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1 heartbeat[8191]: 2011/07/26_13:56:30 info: glib: ucast: bound send socket to device: eth1 heartbeat[8191]: 2011/07/26_13:56:30 info: glib: ucast: bound receive socket to device: eth1 heartbeat[8191]: 2011/07/26_13:56:30 info: glib: ucast: started on port 694 interface eth1 to 10.1.1.2 heartbeat[8191]: 2011/07/26_13:56:30 info: glib: ping group heartbeat started. heartbeat[8191]: 2011/07/26_13:56:30 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[8191]: 2011/07/26_13:56:30 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[8191]: 2011/07/26_13:56:30 info: G_main_add_SignalHandler: Added signal handler for signal 17 heartbeat[8191]: 2011/07/26_13:56:30 info: Local status now set to: 'up' heartbeat[8191]: 2011/07/26_13:56:32 info: Link group1:group1 up. heartbeat[8191]: 2011/07/26_13:56:32 info: Status update for node group1: status ping heartbeat[8191]: 2011/07/26_13:56:33 info: Link ha2.example.com:eth1 up. heartbeat[8191]: 2011/07/26_13:56:33 info: Status update for node ha2.example.com: status up harc[8199]: 2011/07/26_13:56:33 info: Running /etc/ha.d/rc.d/status status heartbeat[8191]: 2011/07/26_13:56:33 info: Comm_now_up(): updating status to active heartbeat[8191]: 2011/07/26_13:56:33 info: Local status now set to: 'active' heartbeat[8191]: 2011/07/26_13:56:33 info: Starting child client "/usr/lib/heartbeat/ipfail" (498,496) heartbeat[8216]: 2011/07/26_13:56:33 info: Starting "/usr/lib/heartbeat/ipfail" as uid 498 gid 496 (pid 8216) heartbeat[8191]: 2011/07/26_13:56:34 info: Status update for node ha2.example.com: status active harc[8219]: 2011/07/26_13:56:34 info: Running /etc/ha.d/rc.d/status status ipfail[8216]: 2011/07/26_13:56:40 info: Status update: Node ha2.example.com now has status active #检查另一个节点的状态 ipfail[8216]: 2011/07/26_13:56:43 info: Asking other side for ping node count. ipfail[8216]: 2011/07/26_13:56:46 info: No giveup timer to abort. heartbeat[8191]: 2011/07/26_13:56:50 info: local resource transition completed. heartbeat[8191]: 2011/07/26_13:56:50 info: Initial resource acquisition complete (T_RESOURCES(us)) heartbeat[8191]: 2011/07/26_13:56:50 info: remote resource transition completed. IPaddr[8271]: 2011/07/26_13:56:51 INFO: Resource is stopped heartbeat[8235]: 2011/07/26_13:56:51 info: Local Resource acquisition completed. harc[8324]: 2011/07/26_13:56:51 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp ip-request-resp[8324]: 2011/07/26_13:56:51 received ip-request-resp IPaddr::172.16.1.100/24/eth0:0 OK yes ResourceManager[8345]: 2011/07/26_13:56:51 info: Acquiring resource group: ha1.example.com IPaddr::172.16.1.100/24/eth0:0 httpd IPaddr[8372]: 2011/07/26_13:56:52 INFO: Resource is stopped #获得资源信息 ResourceManager[8345]: 2011/07/26_13:56:53 info: Running /etc/ha.d/resource.d/IPaddr 172.16.1.100/24/eth0:0 start IPaddr[8470]: 2011/07/26_13:56:54 INFO: Using calculated netmask for 172.16.1.100: 255.255.255.0 IPaddr[8470]: 2011/07/26_13:56:54 INFO: eval ifconfig eth0:0 172.16.1.100 netmask 255.255.255.0 broadcast 172.16.1.255 IPaddr[8441]: 2011/07/26_13:56:54 INFO: Success #取得VIP及ip地址 ResourceManager[8345]: 2011/07/26_13:56:54 info: Running /etc/init.d/httpd start 服务正常了! 该日志为完整日志!
双心跳及HA个人理解综合 http://myhat.blog.51cto.com/391263/623546
相关文章推荐
- HA的一个测试
- 推荐一个正在发展的论坛--测试论坛中的《读者》
- 推荐一个正在发展的论坛--测试论坛中的《读者》
- 推荐一个VC下的FIFO实现源码CCircularFifo,附带测试程序
- 一个好看的测试学习视频网址推荐
- 推荐一个测试Web API, web service工具
- 推荐一个Android下的自动测试框架robotium
- 一个在线测试正则表达式的网站推荐
- 推荐一个php在线测试地址
- 推荐一个BT种子搜索SITE ,可以找个oreilly 1.16G的EBOOK 已经测试可以下载~~~
- 推荐一个正在发展的论坛--测试论坛中的《读者》
- 推荐一个小型自动化测试工具 - Macro Scheduler
- 使用Python编写一个渗透测试探测器 推荐
- 推荐一个压力测试工具stressmark
- 推荐一个在线测试你的VBSCRIPT能力的网站
- 浏览器兼容测试最坑爹,费时费力;今天推荐一个在线的浏览器兼容测试服务
- [数据库测试]强烈推荐一个python ODBC数据源插件,可支持Oracle,Db2,Mysql,Sql-server以及各种数据库版本,附例子和测试程序
- 推荐一个linux下的web压力测试工具神器webbench
- 推荐一个我觉得最好的在线正则测试网站
- 推荐一个正在发展的论坛--测试论坛中的《读者》