您的位置：首页 > Web前端 > Node.js

【MySQL+keepalived】keepalived two node become master and have the same virtual ipaddr

2017-10-30 16:32 567 查看

前提: MySQL双主+keepalived实现MySQL的高可用。
环境：

master: 172.16.3.5 TiDB-node1
slave : 172.16.3.7 TiDB-node3
VIP   : 172.16.3.100

问题: Master开启之后先进入BACKUP state,然后check script 检测成功之后,进入MASTER state,然后在MASTER上面获取得到VIP;然后在SLVAE上面开启keepalived,也是先进入BACKUP state,按照正常的逻辑,在MASTER 广播的时候SLAVE获取得到了在VRRP这个组里面已经存在了一个MASTER,所以SLAVE应该继续保持BACKUP state,但是BACKUP state在check script成功也进入了MASTER state,并且也获取得到了VIP.
MASTER的keepalived的配置信息:

vrrp_script vs_mysql_82 { #定义检测脚本
script "/usr/local/python/bin/python /etc/keepalived/checkMySQL.py -h 172.16.3.5 -P 3306"
interval 60 #脚本执行的间隔时间
}
vrrp_instance VI_82 {
state BACKUP  #初始均为BACKUP state
nopreempt     #设置为不争抢状态,即MASTER降级为FAULT之后,恢复之后旧主为BACKUP,不升级为MASTER.
interface eth0 #绑定的网卡
virtual_router_id 172  #route id;进行分组,相同则分为同一个组
priority 100 #权重
advert_int 5  #keepalived通信的间隔
authentication {
auth_type PASS
auth_pass 1111
}
track_script {
vs_mysql_82 #检测脚本
}
virtual_ipaddress {
172.16.3.100
}
}

SLAVE的keepalived的配置信息:

vrrp_script vs_mysql_82 {
script "/usr/local/python/bin/python /etc/keepalived/checkMySQL.py -h 172.16.3.7 -P 3306"
interval 60
}
vrrp_instance VI_82 {
state BACKUP
nopreempt
interface eth0
virtual_router_id 172
priority 90
advert_int 5
authentication {
auth_type PASS
auth_pass 1111
}
track_script {
vs_mysql_82
}
virtual_ipaddress {
172.16.3.100
}
}

出现这种情况下,我的第一考虑就是发生的脑裂,但是我俩者互ping,都是可以的.并且在master和slave本地执行 ip addr show的情况如下：
master:

[root@TiDB-node1 ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
link/ether 00:0c:29:20:ce:b4 brd ff:ff:ff:ff:ff:ff
inet 172.16.3.5/22 brd 172.16.3.255 scope global eth0
inet 172.16.3.100/32 scope global eth0
inet6 fe80::20c:29ff:fe20:ceb4/64 scope link
valid_lft forever preferred_lft forever

slave:

[root@TiDB-node1 keepalived]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
link/ether 00:0c:29:20:ce:b4 brd ff:ff:ff:ff:ff:ff
inet 172.16.3.5/22 brd 172.16.3.255 scope global eth0
inet 172.16.3.100/32 scope global eth0
inet6 fe80::20c:29ff:fe20:ceb4/64 scope link
valid_lft forever preferred_lft forever

然后链接mysql执行select @@hostname

[root@private-STG4 ~]# mysql -urpl -h172.16.3.100 -p -P3306
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 10
Server version: 5.7.17-log MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

rpl@mysqldb 16:12:  [(none)]> select @@hostname;
+------------+
| @@hostname |
+------------+
| TiDB-node3 |
+------------+
1 row in set (0.00 sec)

之后继续验证:
1.在master和slave 分别开启执行

tcpdump  -i eth0 host 172.16.3.100 -vvvv

2.在slave上面关掉 keepalived

/etc/init.d/keepalived stop

3.在任意一台非master和非slave的机器上面执行

ping 172.16.3.100

4.这个时候在master显示:

[root@TiDB-node1 keepalived]# tcpdump  -i eth0 host 172.16.3.100 -vvvv
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
15:28:11.968811 IP (tos 0x0, ttl 64, id 57379, offset 0, flags [DF], proto ICMP (1), length 84)
172.16.3.15 > 172.16.3.100: ICMP echo request, id 2057, seq 54, length 64
15:28:12.968815 IP (tos 0x0, ttl 64, id 57380, offset 0, flags [DF], proto ICMP (1), length 84)
172.16.3.15 > 172.16.3.100: ICMP echo request, id 2057, seq 55, length 64
15:28:13.968840 IP (tos 0x0, ttl 64, id 57381, offset 0, flags [DF], proto ICMP (1), length 84)
172.16.3.15 > 172.16.3.100: ICMP echo request, id 2057, seq 56, length 64
15:28:14.968870 IP (tos 0x0, ttl 64, id 57382, offset 0, flags [DF], proto ICMP (1), length 84)
172.16.3.15 > 172.16.3.100: ICMP echo request, id 2057, seq 57, length 64
15:28:15.968872 IP (tos 0x0, ttl 64, id 57383, offset 0, flags [DF], proto ICMP (1), length 84)

5.在slave重新启动keepalived

/etc/init.d/keepalived start

6.在master上面显示:

15:28:42.097462 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 172.16.3.100 (Broadcast) tell 172.16.3.100, length 46
15:28:42.097693 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 172.16.3.100 (Broadcast) tell 172.16.3.100, length 46
15:28:42.097706 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 172.16.3.100 (Broadcast) tell 172.16.3.100, length 46
15:28:42.097711 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 172.16.3.100 (Broadcast) tell 172.16.3.100, length 46
15:28:42.097715 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 172.16.3.100 (Broadcast) tell 172.16.3.100, length 46
15:28:47.098555 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 172.16.3.100 (Broadcast) tell 172.16.3.100, length 46
15:28:47.098773 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 172.16.3.100 (Broadcast) tell 172.16.3.100, length 46
15:28:47.098783 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 172.16.3.100 (Broadcast) tell 172.16.3.100, length 46
15:28:47.098786 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 172.16.3.100 (Broadcast) tell 172.16.3.100, length 46
15:28:47.098789 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 172.16.3.100 (Broadcast) tell 172.16.3.100, length 46

7.在slave上面显示:

[root@TiDB-node3 keepalived]# tcpdump  -i eth0 host 172.16.3.100 -vvvvv
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
15:28:42.102540 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 172.16.3.100 (Broadcast) tell 172.16.3.100, length 28
15:28:42.102614 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 172.16.3.100 (Broadcast) tell 172.16.3.100, length 28
15:28:42.102620 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 172.16.3.100 (Broadcast) tell 172.16.3.100, length 28
15:28:42.102625 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 172.16.3.100 (Broadcast) tell 172.16.3.100, length 28
15:28:42.102636 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 172.16.3.100 (Broadcast) tell 172.16.3.100, length 28
15:28:42.974138 IP (tos 0x0, ttl 64, id 57410, offset 0, flags [DF], proto ICMP (1), length 84)
172.16.3.15 > 172.16.3.100: ICMP echo request, id 2057, seq 85, length 64
15:28:42.974268 IP (tos 0x0, ttl 64, id 41230, offset 0, flags [none], proto ICMP (1), length 84)
172.16.3.100 > 172.16.3.15: ICMP echo reply, id 2057, seq 85, length 64
15:28:43.974149 IP (tos 0x0, ttl 64, id 57411, offset 0, flags [DF], proto ICMP (1), length 84)
172.16.3.15 > 172.16.3.100: ICMP echo request, id 2057, seq 86, length 64

按照上面显示的信息可以明确的得出slave是已经抢占了VIP,虽然在master上面可以ip addr show可以看得到VIP,但是这个VIP对外已经不能提供服务了,无法对外提供通信。
那么可以获得结论就是master和slave俩者之间的keepalived无法进行通信,slave不能和master进行通信,所以才会抢占VIP,那么现在的问题就是在于如何得到俩者不能通信的原因了:

- check your firewall to ensure packets aren't being caught
- check your networking to ensure em1 is the same network on both machines

在Google了一番之后,发现俩者不能通信可能导致的原因有俩个,一个就是因为防火墙阻塞了俩者的通信,另外就是绑定的网卡名错误。第二个原因可以排除,那么剩下的就只有第一个原因,检查了一番,发现防火墙真的打开的,只对外开放了22,80,3306端口；关掉俩者的防火墙之后,keepalived能够正常工作。

keepalived的通信是vrrp协议.并不是走22，80，3306端口。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： 错误 keepalived

相关文章推荐

新的分享

章节导航