您的位置:首页 > 数据库 > MySQL

mysql MHA主从切换问题实验总结

2016-05-28 14:39 639 查看
问题一:

Fri May 27 10:01:05 2016 - [error][/apps/lib/mha/mha_manager/MHA/MasterRotate.pm, ln161] We should not start online master switch when one of connections are running long updates on the current master(10.16.24.108(10.16.24.108:3307)). Currently
1 update thread(s) are running.

Details:

{'Time' => '88270','Command' => 'Daemon','db' => undef,'Id' => '2','Info' => undef,'User' => 'event_scheduler','Progress' => '0.000','State' => 'Waiting on empty queue','Host' => 'localhost'}

Fri May 27 10:01:05 2016 - [error][/apps/lib/mha/mha_manager/MHA/ManagerUtil.pm, ln177] Got ERROR: at /apps/sh/mha/mha_manager/bin/masterha_master_switch line 53.

解决方法:

关掉event_schedule即可:

(product)mha@10.16.24.108 [(none)]> SET GLOBAL event_scheduler =off;

Query OK, 0 rows affected (0.00 sec)

(product)mha@10.16.24.108 [(none)]> Select @@event_scheduler;

+-------------------+

| @@event_scheduler |

+-------------------+

| OFF |

+-------------------+

1 row in set (0.00 sec)

(product)mha@10.16.24.108 [(none)]> show processlist\G

*************************** 1. row ***************************

Id: 140

User: repl

Host: 10.16.24.107:44449

db: NULL

Command: Binlog Dump

Time: 13262

State: Master has sent all binlog to slave; waiting for binlog to be updated

Info: NULL

Progress: 0.000

*************************** 2. row ***************************

Id: 141

User: repl

Host: 10.16.24.109:23490

db: NULL

Command: Binlog Dump

Time: 13254

State: Master has sent all binlog to slave; waiting for binlog to be updated

Info: NULL

Progress: 0.000

*************************** 3. row ***************************

Id: 147

User: mha

Host: 10.16.24.108:59213

db: NULL

Command: Query

Time: 0

State: init

Info: show processlist

Progress: 0.000

3 rows in set (0.00 sec)

问题二:

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 10.16.24.108(10.16.24.108:3307)? (YES/no): yes

Fri May 27 15:19:14 2016 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..

Fri May 27 15:19:14 2016 - [info] ok.

Fri May 27 15:19:14 2016 - [info] Checking MHA is not monitoring or doing failover..

Fri May 27 15:19:14 2016 - [info] Checking replication health on 10.16.24.107..

Fri May 27 15:19:14 2016 - [info] ok.

Fri May 27 15:19:14 2016 - [info] Checking replication health on 10.16.24.109..

Fri May 27 15:19:14 2016 - [info] ok.

Fri May 27 15:19:14 2016 - [error][/apps/lib/mha/mha_manager/MHA/ServerManager.pm, ln1218] 10.16.24.109 is not alive!

Fri May 27 15:19:14 2016 - [error][/apps/lib/mha/mha_manager/MHA/MasterRotate.pm, ln232] Failed to get new master!

Fri May 27 15:19:14 2016 - [error][/apps/lib/mha/mha_manager/MHA/ManagerUtil.pm, ln177] Got ERROR: at /apps/sh/mha/mha_manager/bin/masterha_master_switch line 53.

解决方法:

因为10.16.24.109的/apps/conf/mha/app1.cnf中的no_master=1限制了它成为新master的可能,标识掉no_master=1后,重新在线切换成功。

问题三:

Sat May 28 09:35:06 2016 - [info] Master configurations are as below:

Master 10.16.24.109(10.16.24.109:3307), replicating from 10.16.24.108(10.16.24.108:3307)

Master 10.16.24.108(10.16.24.108:3307), replicating from 10.16.24.109(10.16.24.109:3307), read-only

Sat May 28 09:35:06 2016 - [warning] SQL Thread is stopped(no error) on 10.16.24.108(10.16.24.108:3307)

Sat May 28 09:35:06 2016 - [error][/apps/lib/mha/mha_manager/MHA/ServerManager.pm, ln726] Slave 10.16.24.107(10.16.24.107:3307) replicates from 10.16.24.108:3307, but real master is 10.16.24.109(10.16.24.109:3307)!

Sat May 28 09:35:06 2016 - [error][/apps/lib/mha/mha_manager/MHA/ManagerUtil.pm, ln177] Got ERROR: at /apps/lib/mha/mha_manager/MHA/MasterRotate.pm line 85.

解决方法:

10.16.24.108上执行:set global read_only=off;

10.16.24.109上执行:set global read_only=on;

10.16.24.107上执行:set global read_only=on;

问题四:

Sat May 28 10:00:32 2016 831853 Set read_only=0 on the new master.

Sat May 28 10:00:32 2016 832417Add vip 10.16.24.58 on eth1..

RTNETLINK answers: Operation not permitted

解决方法:

在root用户下每个节点执行:

chmod u+s /sbin/ip

问题五:

MHA手工在线切换后,vip也漂到新主库上,但在其它主机上用vip连接时,却还是连到本主机的从库上

是啥原因

解决方法:

在所有从库上执行drop_vip.sh即可
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: