用MHA实现mysql自动故障转移
2015-08-07 16:53
1066 查看
一:MHA介绍
什么是mha,有什么特性1. 主服务器的自动监控和故障转移
MHA监控复制架构的主服务器,一旦检测到主服务器故障,就会自动进行故障转移。即使有些从服务器没有收到最新的relay log,MHA自动从最新的从服务器上识别差异的relay log并把这些日志应用到其他从服务器上,因此所有的从服务器保持一致性了。MHA通常在几秒内完成故障转移,9-12秒可以检测出主服务器故障,7-10秒内关闭故障的主服务器以避免脑裂,几秒中内应用差异的relay log到新的主服务器上,整个过程可以在10-30s内完成。还可以设置优先级指定其中的一台slave作为master的候选人。由于MHA在slaves之间修复一致性,因此可以将任何slave变成新的master,而不会发生一致性的问题,从而导致复制失败。
MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。
在MHA自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用MySQL
5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性。
目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器,出于机器成本的考虑,淘宝也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。(出自:《深入浅出MySQL(第二版)》)
2. 交互式主服务器故障转移
可以只使用MHA的故障转移,而不用于监控主服务器,当主服务器故障时,人工调用MHA来进行故障故障。
3. 非交互式的主故障转移
不监控主服务器,但自动实现故障转移。这种特征适用于已经使用其他软件来监控主服务器状态,比如heartbeat来检测主服务器故障和虚拟IP地址接管,可以使用MHA来实现故障转移和slave服务器晋级为master服务器。
4. 在线切换主从服务器
在许多情况下,需要将现有的主服务器迁移到另外一台服务器上。比如主服务器硬件故障,RAID控制卡需要重建,将主服务器移到性能更好的服务器上等等。维护主服务器引起性能下降,导致停机时间至少无法写入数据。另外,阻塞或杀掉当前运行的会话会导致主主之间数据不一致的问题发生。MHA提供快速切换和优雅的阻塞写入,这个切换过程只需要0.5-2s的时间,这段时间内数据是无法写入的。在很多情况下,0.5-2s的阻塞写入是可以接受的。因此切换主服务器不需要计划分配维护时间窗口(呵呵,不需要你在夜黑风高时通宵达旦完成切换主服务器的任务)。
5. MHA由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)
管理节点可以和数据节点在同一台机器,也可以不在同一台机器上。
6. MHA比较灵活,可以写脚本,来进行故障转移,或者主从切换等。
7.mha出现故障后,配置文件会被修改掉,这一点,让我觉得很搞笑,如果故障转移需要重新修改配置文件,重新启动masterha_manager服务.
缺点:
1、 虽然MHA试图从宕机的主服务器上保存二进制日志,但也会有问题。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失最新数据。
2、 当主DB故障,切换到另外的服务器上后,即使恢复了原来的主DB,也不能立即加入整套MHA系统中,得重新部署。而且当发生一次切换后,管理节点的监控进程就会自动退出,需要用脚本来自动启动。
二:实验环境
已经搭建好主从配置,一主两从。主从配置搭建,请参考:http://blog.csdn.net/yabingshi_tech/article/details/45192599。
三:实验步骤
1:修改/etc/hosts
在3台机器上都添加每台服务器的主机名,如:192.168.6.51 master //主
192.168.6.52 slave1 //从
192.168.6.70 slave2 //从(主备)
2:配置主机信任关系
#在192.168.6.51生成密码文件,然后将其拷贝到本机,192.168.6.52和192.168.6.70上。# ssh-keygen
# ssh-copy-id root@192.168.6.51
# ssh-copy-id root@192.168.6.52
# ssh-copy-id root@192.168.6.70
依次在192.168.6.52,192.168.6.70上也生成密码文件,然后拷贝到本机与其他机器上。
配置完成后,用ssh ip测试,看是否能免密码登陆。
3:安装MHA
点击这里进行下载:http://pan.baidu.com/s/1pJ0VkSz
或者:
http://download.csdn.net/download/yabignshi/8974251
http://download.csdn.net/detail/yabignshi/8974265
在所有数据节点上安装:
yum install perl-DBD-MySQL -y
rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
安装完成后会在/usr/bin目录下生成以下脚本文件(这些工具通常由MHA
Manager的脚本触发,无需人为操作):
save_binary_logs //保存和复制master的二进制日志
apply_diff_relay_logs //识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog //去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs //清除中继日志(不会阻塞SQL线程)
在管理节点上安装:
yum install perl-DBD-MySQL -y(由于这里管理节点和数据节点都在6.51上,所以这个省略)
yum install perl-Config-Tiny -y
yum install epel-release -y
yum install perl-Log-Dispatch -y
yum install perl-Parallel-ForkManager -y
rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
安装完成后会在/usr/bin目录下生成以下脚本文件:
-rwxr-xr-x. 1 root root 1995 Apr 1 2014 masterha_check_repl
-rwxr-xr-x. 1 root root 1779 Apr 1 2014 masterha_check_ssh
-rwxr-xr-x. 1 root root 1865 Apr 1 2014 masterha_check_status
-rwxr-xr-x. 1 root root 3201 Apr 1 2014 masterha_conf_host
-rwxr-xr-x. 1 root root 2517 Apr 1 2014 masterha_manager
-rwxr-xr-x. 1 root root 2165 Apr 1 2014 masterha_master_monitor
-rwxr-xr-x. 1 root root 2373 Apr 1 2014 masterha_master_switch
-rwxr-xr-x. 1 root root 5171 Apr 1 2014 masterha_secondary_check
-rwxr-xr-x. 1 root root 1739 Apr 1 2014 masterha_stop
-rwxr-xr-x. 1 root root 4807 Apr 1 2014 filter_mysqlbinlog
-rwxr-xr-x. 1 root root 7525 Apr 1 2014 save_binary_logs
-rwxr-xr-x. 1 root root 8261 Apr 1 2014 purge_relay_logs
-rwxr-xr-x. 1 root root 16367 Apr 1 2014 apply_diff_relay_logs
4:从服务器配置
从服务器,要加上relay_log_purge=0,不加的话,会报出warning,relay_log_purge=0 is not set on slave4.1 在线设置
set global relay_log_purge = 0;4.2 修改配置文件
vi /etc/my.cnf添加:
relay_log_purge = 0
5:配置mha manage
5.1 添加管理账号
#在数据节点上执行以下操作grant all privileges on *.* TO mha@'192.168.%' IDENTIFIED BY 'test';
5.2:配置/etc/mha/app1.cnf
#只在管理端做,manage这台机器mkdir /etc/mha
mkdir -p /var/log/mha/app1
vi /etc/mha/app1.cnf
添加:
[server default] manager_log=/var/log/mha/app1/manager.log manager_workdir=/var/log/mha/app1.log master_binlog_dir=/data/mysql/data user=mha password=test ping_interval=2 repl_password=beijing repl_user=rep_user ssh_user=root [server1] hostname=192.168.6.51 port=3306 [server2] candidate_master=1 check_repl_delay=0 hostname=192.168.6.52 port=3306 [server3] hostname=192.168.6.70 port=3306
在server default中的配置,是三台机器共同的配置,也可以放到具体的server中进行定制
6:检查mha manage是不是配置成功
6.1 检查ssh登录
[root@ser6-51 .ssh]# masterha_check_ssh --conf=/etc/mha/app1.cnfFri Aug 7 15:11:07 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Fri Aug 7 15:11:07 2015 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Fri Aug 7 15:11:07 2015 - [info] Reading server configuration from /etc/mha/app1.cnf.. Fri Aug 7 15:11:07 2015 - [info] Starting SSH connection tests.. Fri Aug 7 15:11:08 2015 - [debug] Fri Aug 7 15:11:07 2015 - [debug] Connecting via SSH from root@192.168.6.51(192.168.6.51:22) to root@192.168.6.52(192.168.6.52:22).. Fri Aug 7 15:11:07 2015 - [debug] ok. Fri Aug 7 15:11:07 2015 - [debug] Connecting via SSH from root@192.168.6.51(192.168.6.51:22) to root@192.168.6.70(192.168.6.70:22).. Fri Aug 7 15:11:08 2015 - [debug] ok. Fri Aug 7 15:11:08 2015 - [debug] Fri Aug 7 15:11:07 2015 - [debug] Connecting via SSH from root@192.168.6.52(192.168.6.52:22) to root@192.168.6.51(192.168.6.51:22).. Fri Aug 7 15:11:08 2015 - [debug] ok. Fri Aug 7 15:11:08 2015 - [debug] Connecting via SSH from root@192.168.6.52(192.168.6.52:22) to root@192.168.6.70(192.168.6.70:22).. Fri Aug 7 15:11:08 2015 - [debug] ok. Fri Aug 7 15:11:09 2015 - [debug] Fri Aug 7 15:11:08 2015 - [debug] Connecting via SSH from root@192.168.6.70(192.168.6.70:22) to root@192.168.6.51(192.168.6.51:22).. Fri Aug 7 15:11:08 2015 - [debug] ok. Fri Aug 7 15:11:09 2015 - [debug] Connecting via SSH from root@192.168.6.70(192.168.6.70:22) to root@192.168.6.52(192.168.6.52:22).. Fri Aug 7 15:11:09 2015 - [debug] ok. Fri Aug 7 15:11:09 2015 - [info] All SSH connection tests passed successfully.
/*
如果看到,All SSH connection tests passed successfully,就说明ssh配置成功了
假如报错:
[error][/usr/share/perl5/vendor_perl/MHA/SSHCheck.pm, ln63]
将.ssh下的内容全部清空,然后重新认证即可
*/
6.2检查mysql replication是否配置成功
[root@ser6-51 .ssh]# masterha_check_repl --conf=/etc/mha/app1.cnf Fri Aug 7 15:33:11 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Fri Aug 7 15:33:11 2015 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Fri Aug 7 15:33:11 2015 - [info] Reading server configuration from /etc/mha/app1.cnf.. Fri Aug 7 15:33:11 2015 - [info] MHA::MasterMonitor version 0.56. Fri Aug 7 15:33:11 2015 - [info] GTID failover mode = 0 Fri Aug 7 15:33:11 2015 - [info] Dead Servers: Fri Aug 7 15:33:11 2015 - [info] Alive Servers: Fri Aug 7 15:33:11 2015 - [info] 192.168.6.51(192.168.6.51:3306) Fri Aug 7 15:33:11 2015 - [info] 192.168.6.52(192.168.6.52:3306) Fri Aug 7 15:33:11 2015 - [info] 192.168.6.70(192.168.6.70:3306) Fri Aug 7 15:33:11 2015 - [info] Alive Slaves: Fri Aug 7 15:33:11 2015 - [info] 192.168.6.52(192.168.6.52:3306) Version=5.6.20-r5436-log (oldest major version between slaves) log-bin:enabled Fri Aug 7 15:33:11 2015 - [info] Replicating from 192.168.6.51(192.168.6.51:3306) Fri Aug 7 15:33:11 2015 - [info] Primary candidate for the new Master (candidate_master is set) Fri Aug 7 15:33:11 2015 - [info] 192.168.6.70(192.168.6.70:3306) Version=5.6.20-r5436-log (oldest major version between slaves) log-bin:enabled Fri Aug 7 15:33:11 2015 - [info] Replicating from 192.168.6.51(192.168.6.51:3306) Fri Aug 7 15:33:11 2015 - [info] Current Alive Master: 192.168.6.51(192.168.6.51:3306) Fri Aug 7 15:33:11 2015 - [info] Checking slave configurations.. Fri Aug 7 15:33:11 2015 - [info] Checking replication filtering settings.. Fri Aug 7 15:33:11 2015 - [info] binlog_do_db= , binlog_ignore_db= Fri Aug 7 15:33:11 2015 - [info] Replication filtering check ok. Fri Aug 7 15:33:11 2015 - [info] GTID (with auto-pos) is not supported Fri Aug 7 15:33:11 2015 - [info] Starting SSH connection tests.. Fri Aug 7 15:33:13 2015 - [info] All SSH connection tests passed successfully. Fri Aug 7 15:33:13 2015 - [info] Checking MHA Node version.. Fri Aug 7 15:33:14 2015 - [info] Version check ok. Fri Aug 7 15:33:14 2015 - [info] Checking SSH publickey authentication settings on the current master.. Fri Aug 7 15:33:14 2015 - [info] HealthCheck: SSH to 192.168.6.51 is reachable. Fri Aug 7 15:33:14 2015 - [info] Master MHA Node version is 0.56. Fri Aug 7 15:33:14 2015 - [info] Checking recovery script configurations on 192.168.6.51(192.168.6.51:3306).. Fri Aug 7 15:33:14 2015 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/data --output_file=/var/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000032 Fri Aug 7 15:33:14 2015 - [info] Connecting to root@192.168.6.51(192.168.6.51:22).. Creating /var/tmp if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /data/mysql/data, up to mysql-bin.000032 Fri Aug 7 15:33:15 2015 - [info] Binlog setting check done. Fri Aug 7 15:33:15 2015 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Fri Aug 7 15:33:15 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.6.52 --slave_ip=192.168.6.52 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.20-r5436-log --manager_version=0.56 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ --slave_pass=xxx Fri Aug 7 15:33:15 2015 - [info] Connecting to root@192.168.6.52(192.168.6.52:22).. Checking slave recovery environment settings.. Opening /data/mysql/data/relay-log.info ... ok. Relay log found at /data/mysql/data, up to mysql-relay-bin.000002 Temporary relay log file is /data/mysql/data/mysql-relay-bin.000002 Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Fri Aug 7 15:33:16 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=192.168.6.70 --slave_ip=192.168.6.70 --slave_port=3306 --workdir=/var/tmp --target_version=5.6.20-r5436-log --manager_version=0.56 --relay_log_info=/data/mysql/data/relay-log.info --relay_dir=/data/mysql/data/ --slave_pass=xxx Fri Aug 7 15:33:16 2015 - [info] Connecting to root@192.168.6.70(192.168.6.70:22).. Checking slave recovery environment settings.. Opening /data/mysql/data/relay-log.info ... ok. Relay log found at /data/mysql/data, up to mysql-relay-bin.000003 Temporary relay log file is /data/mysql/data/mysql-relay-bin.000003 Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Fri Aug 7 15:33:17 2015 - [info] Slaves settings check done. Fri Aug 7 15:33:17 2015 - [info] 192.168.6.51(192.168.6.51:3306) (current master) +--192.168.6.52(192.168.6.52:3306) +--192.168.6.70(192.168.6.70:3306) Fri Aug 7 15:33:17 2015 - [info] Checking replication health on 192.168.6.52.. Fri Aug 7 15:33:17 2015 - [info] ok. Fri Aug 7 15:33:17 2015 - [info] Checking replication health on 192.168.6.70.. Fri Aug 7 15:33:17 2015 - [info] ok. Fri Aug 7 15:33:17 2015 - [warning] master_ip_failover_script is not defined. Fri Aug 7 15:33:17 2015 - [warning] shutdown_script is not defined. Fri Aug 7 15:33:17 2015 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
/*
假如执行该命令报错:
……
Fri Aug 7 15:17:21 2015 - [info] Connecting to root@192.168.6.52(192.168.6.52:22).. Can't exec "mysqlbinlog": No such file or directory at /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm line 106. mysqlbinlog version command failed with rc 1:0, please verify PATH, LD_LIBRARY_PATH, and client options at /usr/bin/apply_diff_relay_logs line 493 Fri Aug 7 15:17:21 2015 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln205] Slaves settings check failed! Fri Aug 7 15:17:21 2015 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln413] Slave configuration failed. Fri Aug 7 15:17:21 2015 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. at /usr/bin/masterha_check_repl line 48 Fri Aug 7 15:17:21 2015 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers. Fri Aug 7 15:17:21 2015 - [info] Got exit code 1 (Not master dead). MySQL Replication Health is NOT OK!
在所有数据节点上都创建一下软连接:
ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
再次运行:
masterha_check_repl --conf=/etc/mha/app1.cnf
又报一个新的错误:
Testing mysql connection and privileges..sh: mysql: command not found mysql command failed with rc 127:0! at /usr/bin/apply_diff_relay_logs line 375 main::check() called at /usr/bin/apply_diff_relay_logs line 497 eval {...} called at /usr/bin/apply_diff_relay_logs line 475 main::main() called at /usr/bin/apply_diff_relay_logs line 120 Fri Aug 7 15:28:08 2015 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln205] Slaves settings check failed! Fri Aug 7 15:28:08 2015 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln413] Slave configuration failed. Fri Aug 7 15:28:08 2015 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. at /usr/bin/masterha_check_repl line 48 Fri Aug 7 15:28:08 2015 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers. Fri Aug 7 15:28:08 2015 - [info] Got exit code 1 (Not master dead).
在所有数据节点上建立软连接:
ln -s /usr/local/mysql/bin/mysql /usr/bin/mysql
*/
7:在管理端启动监控
[root@ser6-51 .ssh]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &[1] 14694
[root@ser6-51 .ssh]# masterha_check_status --conf=/etc/mha/app1.cnf //查看状态
app1 (pid:14694) is running(0:PING_OK), master:192.168.6.51
# masterha_stop --conf=/etc/mha/app1.cnf //关闭监控
8:测试
在6.51上关闭Mysql实例:[root@ser6-51 ~]# service mysql stop
Shutting down MySQL... [ OK ]
在管理节点上查看日志:
[root@ser6-51 ~]# tail -f /var/log/mha/app1/manager.log ----- Failover Report ----- app1: MySQL Master failover 192.168.6.51(192.168.6.51:3306) to 192.168.6.52(192.168.6.52:3306) succeeded Master 192.168.6.51(192.168.6.51:3306) is down! Check MHA Manager logs at ser6-51:/var/log/mha/app1/manager.log for details. Started automated(non-interactive) failover. The latest slave 192.168.6.52(192.168.6.52:3306) has all relay logs for recovery. Selected 192.168.6.52(192.168.6.52:3306) as a new master. 192.168.6.52(192.168.6.52:3306): OK: Applying all logs succeeded. 192.168.6.70(192.168.6.70:3306): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. 192.168.6.70(192.168.6.70:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.6.52(192.168.6.52:3306) 192.168.6.52(192.168.6.52:3306): Resetting slave info succeeded. Master failover to 192.168.6.52(192.168.6.52:3306) completed successfully.
可以看到master自动切换到6.52上了。
在6.70上查看:
mysql> show slave status \G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.6.52 Master_User: rep_user Master_Port: 3306
可以看到master_host变成了192.168.6.52.
在现在的master上查看变量read_only,发现被自动关闭了,说明之前的slave现在可写了:
mysql> show variables like 'read_only';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only | OFF |
+---------------+-------+
1 row in set (0.01 sec)
欧耶。嘻嘻。
我发现再次启动6.51的mysql后,并没有自动加入集群。需要自己连接到现在的master上。
本篇文章参考自:http://blog.51yip.com/mysql/1722.html,
http://www.cnblogs.com/wingsless/p/4033093.html,
初探keepalive+mysql-ha架构
相关文章推荐
- mysql 备份工具xtrabackup(一)
- Mysql子查询-select语句嵌套-检索多个表
- mysql外键索引
- MySQL锁用法介绍
- mysql存储对象
- HQL当前时间与Mysql数据库时间比较
- Mac如何删除MySQL,Mac下MySQL卸载方法
- 19.2.7 How MySQL Partitioning Handles NULL
- mysql位_01检查错误代码的方法
- mysqldump
- mysql数据表字符集是latin1,项目是utf8,怎么从数据表中读取数据而保证不乱码?
- mysql 数据表读锁机制详解
- mysql 学习笔记
- MySql绿色版安装配置
- Mysql校验规则
- MAC下安装与配置MySQL
- mysql Access denied for user root@localhost错误解决方法总结(转)
- mysqlimport
- mysql page-level
- MySQL--备份和恢复