您的位置：首页 > 运维架构 > Linux

菜鸟学Linux 第089篇笔记 corosync+pacemaker

2017-01-25 19:05 363 查看

菜鸟学Linux 第089篇笔记 corosync+pacemaker

内容总览
corosync
pacemaker
crmsh

What is High Availability?
Simple Equation: A=MTBF/(MTBF+MTTR)
MTBF = mean time between failures(无故障运行时长)
MTTR = mean time to repair(故障修复时长)
A = probability system will provide service at a random time
(ranging from 0 to 1)

RHEL 5.x RHCS: openais, cman, rgmanager
REHL 6.x RHCS: corosync
corosync: Messaging Layer
openais 插件
openais: AIS

corosync 是一个可以提供集群信息传递的一个软件，用来收集集群信息的 Messaging Layer
pacemaker 是一个crm软件，它可以结合corosync或heartbeat v3来进行集群资源管理
SUSE linux Enterprise Server: Hawk, webGUI
LCMC Linux Cluster Management Console 自学使用gui 的 lcmc

RHCS (RedHat Cluster Suite)
conga(luci(主控台)/ricci(集群节点)) luci webGUI

keepalived: VRRP, 仅支持2节点

配置corosync集群
时间同步
ssh互信

1. 安装pacemaker 和 corosync
yum install pacemaker corosync
yum install crmsh (目前官方未提供可以去 opensuse里找非官方所写不过有源码包)

2.配置corosync
/etc/corosync.conf

corosync.conf，添加如下内容：
service {
ver: 0
name: pacemaker
# use_mgmtd: yes
}

aisexec {
user: root
group: root
}

# corosync-keygen

并将其复制到另外一台节点中
# scp authkey corosync.conf root@node2.mysky.com:/etc/corosync/

# service NetworkManager stop
# chkconfig NetworkManager off

至此便可启动corosync
# serivice corosync start

检查corosync启动是否正确

查看corosync引擎是否正常启动：
# grep -e "Corosync Cluster Engine" /var/log/cluster/corosync.log
# grep -e "configuration file" /var/log/cluster/corosync.log
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1397.
Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide service.
Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.

查看初始化成员节点通知是否正常发出：
# grep TOTEM /var/log/cluster/corosync.log
Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transport (UDP/IP).
Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] The network interface [172.16.100.11] is now up.
Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] A processor joined or left the membership and a new membership was formed.

检查启动过程中是否有错误产生：
# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources

查看pacemaker是否正常启动：
# grep pcmk_startup /var/log/cluster/corosync.log
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: CRM: Initialized
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] Logging: Initialized pcmk_startup
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Service: 9
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Local hostname: node1.magedu.com

如果上面命令执行均没有问题，接着可以执行如下命令启动node2上的corosync
# ssh node2 -- /etc/init.d/corosync start

注意：启动node2需要在node1上使用如上命令进行，不要在node2节点上直接启动；

使用如下命令查看集群节点的启动状态：
# crm status
============
Last updated: Tue Jun 14 19:07:06 2011
Stack: openais
Current DC: node1.magedu.com - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ node1.magedu.com node2.magedu.com ]

从上面的信息可以看出两个节点都已经正常启动，并且集群已经处于正常工作状态。

执行ps auxf命令可以查看corosync启动的各相关进程。
root 4665 0.4 0.8 86736 4244 ? Ssl 17:00 0:04 corosync
root 4673 0.0 0.4 11720 2260 ? S 17:00 0:00 \_ /usr/lib/heartbeat/stonithd
101 4674 0.0 0.7 12628 4100 ? S 17:00 0:00 \_ /usr/lib/heartbeat/cib
root 4675 0.0 0.3 6392 1852 ? S 17:00 0:00 \_ /usr/lib/heartbeat/lrmd
101 4676 0.0 0.4 12056 2528 ? S 17:00 0:00 \_ /usr/lib/heartbeat/attrd
101 4677 0.0 0.5 8692 2784 ? S 17:00 0:00 \_ /usr/lib/heartbeat/pengine
101 4678 0.0 0.5 12136 3012 ? S 17:00 0:00 \_ /usr/lib/heartbeat/crmd

crm资源管理交互界面
子模式
resources 资源管理

status 状态查看

configure
group

查询使用
help
meta

资源粘性大于资源约束的location分数时，资源约束的分数会失效

然后接下来就是工具的使用了
cli
crm
pcs(web-gui)
gui
lcmc

crmsh 是一个命令行式接口用来管理集群
添加资源
添加节点等等这里不细说咱等下回分解哈哈其实是我玩得不太六哈哈

马上过年了，，前年这几天就光找rpm包了所以没有笔记更新时间明显间隔太长，，
回家再继续 keep going!

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： 菜鸟学Linux 第089篇笔记 cor

相关文章推荐

新的分享

章节导航