一次DRBD脑裂行为的模拟
2015-07-10 14:07
357 查看
drbd1为主,drbd2为辅;我个人觉得这个DRBD脑裂的行为,也应该是前期人为或是故障切换造成的, 如HA。 上次跟一朋友去一客户那里,他那就是属于使用HA做故障切换,最后不知道他们咋搞,在一台 机上把DRBD的服务给挂了,因为该服务器非常重要,他们对HA及DRBD架构不太熟,在一次HA切换测 试过程中出现了问题,在此模拟一下这个问题吧。 1、断开primary down机或是断开网线 2、查看secondary机器的状态 [root@drbd2 ~]# drbdadm role fs Secondary/Unknown [root@drbd2 ~]# cat /proc/drbd version: 8.3.11 (api:88/proto:86-96) GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by root@drbd2.localdomain, 2011-07-08 11:10:20 #注意下drbd2的cs状态 1: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r----- ns:567256 nr:20435468 dw:21002724 dr:169 al:229 bm:1248 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 将secondary配置成primary角色 [root@drbd2 ~]# drbdadm primary fs [root@drbd2 ~]# drbdadm role fs Primary/Unknown [root@drbd2 ~]# cat /proc/drbd version: 8.3.11 (api:88/proto:86-96) GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by root@drbd2.localdomain, 2011-07-08 11:10:20 1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----- ns:567256 nr:20435468 dw:21002724 dr:169 al:229 bm:1248 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 #挂载 [root@drbd2 ~]# mount /dev/drbd1 /mnt/ [root@drbd2 ~]# cd /mnt/ [root@drbd2 mnt]# ll total 102524 -rw-r--r-- 1 root root 104857600 Jul 8 12:35 100M drwx------ 2 root root 16384 Jul 8 12:33 lost+found #原来的primary机器好了,出现脑裂了。 [root@drbd1 ~]# tail -f /var/log/messages Jul 8 13:14:01 localhost kernel: block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1 exit code 0 (0x0) Jul 8 13:14:01 localhost kernel: block drbd1: Split-Brain detected but unresolved, dropping connection! Jul 8 13:14:01 localhost kernel: block drbd1: helper command: /sbin/drbdadm split-brain minor-1 Jul 8 13:14:01 localhost kernel: block drbd1: helper command: /sbin/drbdadm split-brain minor-1 exit code 0 (0x0) Jul 8 13:14:01 localhost kernel: block drbd1: conn( NetworkFailure -> Disconnecting ) Jul 8 13:14:01 localhost kernel: block drbd1: error receiving ReportState, l: 4! Jul 8 13:14:01 localhost kernel: block drbd1: Connection closed Jul 8 13:14:01 localhost kernel: block drbd1: conn( Disconnecting -> StandAlone ) Jul 8 13:14:01 localhost kernel: block drbd1: receiver terminated Jul 8 13:14:01 localhost kernel: block drbd1: Terminating receiver thread [root@drbd1 ~]# drbdadm role fs Primary/Unknown [root@drbd2 mnt]# drbdadm role fs Primary/Unknown #drbd1现在是standalone,这个时候,主跟辅是不会相互联系的。 [root@drbd1 ~]# cat /proc/drbd version: 8.3.11 (api:88/proto:86-96) GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by root@drbd1.localdomain, 2011-07-08 11:10:38 1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----- ns:20405516 nr:567256 dw:567376 dr:20405706 al:2 bm:1246 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 [root@drbd1 /]# service drbd status drbd driver loaded OK; device status: version: 8.3.11 (api:88/proto:86-96) GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by root@drbd1.localdomain, 2011-07-08 11:10:38 m:res cs ro ds p mounted fstype 1:fs StandAlone Primary/Unknown UpToDate/DUnknown r----- ext3 这个时候,如果用户有尝试把drbd2的drbd服务重启的话,你就会发现根本无法起来! [root@drbd2 /]# service drbd start Starting DRBD resources: [ ].......... *************************************************************** DRBD's startup script waits for the peer node(s) to appear. - In case this node was already a degraded cluster before the reboot the timeout is 120 seconds. [degr-wfc-timeout] - If the peer was available before the reboot the timeout will expire after 0 seconds. [wfc-timeout] (These values are for resource 'fs'; 0 sec -> wait forever) To abort waiting enter 'yes' [ -- ]:[ 13]:[ 15]:[ 16]:[ 18]:[ 19]:[ 20]:[ 22]: 在drbd2处理方法: [root@drbd2 /]# drbdadm disconnect fs [root@drbd2 /]# drbdadm secondary fs [root@drbd2 /]# drbdadm -- --discard-my-data fs 做完以上三步,你发现你仍然无法启动drbd2上的drbd服务;上次一客户我个人估计就是这个问题,把DRBD重启后,无法启动DRBD。 把他们DBA急的要死。呵呵 需要在drbd1上重连接资源: [root@drbd1 ~]# drbdadm connect fs 再次启动drbd2上的drbd服务,成了。 [root@drbd2 /]# service drbd start Starting DRBD resources: [ ]. 再看看资源同步: [root@drbd2 /]# cat /proc/drbd version: 8.3.11 (api:88/proto:86-96) GIT-hash: 0de839cee13a4160eed6037c4bddd066645e23c5 build by root@drbd2.localdomain, 2011-07-08 11:10:20 1: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r----- ns:0 nr:185532 dw:185532 dr:0 al:0 bm:15 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:299000 [======>.............] sync'ed: 39.5% (299000/484532)K finish: 0:00:28 speed: 10,304 (10,304) want: 10,240 K/sec 大伙来拍砖吧! 补充:虽然是手工模拟但在故障切换时也会出一样的问题。 1、DRBD的资源只能在或主或辅的一台机器上挂载。 2、在做主辅的手工切换时的步骤: a、先将原来挂载的东西进行卸载,这个时候你的应用会停,不建议手工切换主辅 b、将原来的主设置成辅 #drbdadm secondary resource_name c、将原来的辅设置成主 #drbdadm primary resource_name d、挂载资源http://myhat.blog.51cto.com/391263/606318/
相关文章推荐
- 理解AOP
- unity3d中脚本生命周期(MonoBehaviour lifecycle)
- Android工程目录结构介绍
- Sqlite出现SQL error: database disk image is malformed的处理
- SQL Server中使用convert进行日期转换
- 2015年中Android面试
- db2 clt 代码
- 南阳oj 题目100 1的个数
- JavaWeb项目中输入中文搜索出现乱码的解决办法
- 关于读取excel2003版本的数据
- C#上传demo
- 防护方案Hacking Team数据泄露事件
- 陨石坑之webapi使用filter
- CSS中的float与clear
- HDFS读写流程
- 飘逸的python - 代码即文档docstring
- git 自学 总结
- 北京医保报销比例
- gradle 编译android项目 Eclipse
- 验证码生成的c语言库