Ubuntu下ceph快速安装(QUICK)
2018-02-28 10:50
1086 查看
http://bbs.ceph.org.cn/article/83
1. 构成与预检
————————————————————————————————————————————————构成
节点名称 │ 用户名 │ OS版本 │ 机器类型管理节点 bees Ubuntu14.04 Physical
monitor1 bees Ubuntu14.04 KVM
osd1 bees Ubuntu14.04 KVM
osd2 bees Ubuntu14.04 KVM
预检
1. 安装ceph部署工具(管理节点)$ wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add - $ echo deb http://download.ceph.com/debian-{ceph-stable-release}/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list $ sudo apt-get update
问题 1:
bees@monitor1:~$ wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add -
gpg: no valid OpenPGP data found.
原因:
没有配置wget代理。
解决办法:
配置wget代理。
问题 2:
如果root用户使用wget正常,但是非root用户(本例中是bees用户)使用wget出现问题。
bees@monitor1:/root$ sudo wget -O release.asc https://download.ceph.com/keys ... 05-09 16:38:03-- https://download.ceph.com/keys/release.ascX92XResolving download.ceph.com (download.ceph.com)... failed: No address associated with hostname.
wget: unable to resolve host address download.ceph.com
原因:
在root用户下配置wget代理。
解决办法:
在非root用户下(本例中是bees用户)下配置wget代理。2. 安装ntp服务并配置(所有节点)
在所有ceph节点上配置ntp,并同步时间。此处为示例。
$ sudo apt-get install ntp -------------------------------------- #server 0.ubuntu.pool.ntp.org #server 1.ubuntu.pool.ntp.org #server 2.ubuntu.pool.ntp.org #server 3.ubuntu.pool.ntp.org server 127.127.1.0
3. 安装ssh服务(所有节点)
$ sudo apt-get install openssh-server
4. 无密码访问(管理节点)
• 生成密钥
$ ssh-keygen Generating public/private key pair. Enter file in which to save the key (/ceph-admin/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /ceph-admin/.ssh/id_rsa. Your public key has been saved in /ceph-admin/.ssh/id_rsa.pub.
• 将公钥拷贝到各个ceph节点
$ ssh-copy-id bees@monitor1 $ ssh-copy-id bees@osd1 $ ssh-copy-id bees@osd2
• 修改管理节点的 ~/.ssh/config 文件, 添加如下内容
Host monitor1 Hostname monitor1 User bees Host osd1 Hostname osd1 User bees Host osd2 Hostname osd2 User bees
5. 修改防火墙规则(所有节点)
• 删除iptables,ubuntu默认不安装firewall。
$ ufw disable $ apt-get remove iptables
如果有安全需要,推荐制定防火墙规则。6. 配置apt-get源(所有节点)
/etc/apt/sources.list ---------------------------------- deb http://archive.ubuntu.com/ubuntu/ trusty main restricted universe multiverse deb http://archive.ubuntu.com/ubuntu/ trusty-security main restricted universe multiverse deb http://archive.ubuntu.com/ubuntu/ trusty-updates main restricted universe multiverse deb http://archive.ubuntu.com/ubuntu/ trusty-proposed main restricted universe multiverse deb http://archive.ubuntu.com/ubuntu/ trusty-backports main restricted universe multiverse deb-src http://archive.ubuntu.com/ubuntu/ trusty main restricted universe multiverse deb-src http://archive.ubuntu.com/ubuntu/ trusty-security main restricted universe multiverse
7. 配置主机名(所有节点)
/etc/hosts ----------------------------------- 193.168.123.90 bees1 193.168.123.67 bees2 193.168.123.89 osd1 193.168.123.58 monitor1 193.168.123.145 osd2
2. 快速安装(管理节点)
————————————————————————————————————————————————1. 创建集群目录,保存ceph-deploy生成的配置文件及密钥对
推荐使用非root用户(本例中是bees用户)创建。$ mkdir my-cluster $ cd my-cluster
2. 创建集群
$ ceph-deploy new monitor1
3. 允许两个osd也能达到active clean状态。在当前目录下ceph.conf文件的[global]字段中添加如下内容
osd pool default size = 2
4. 如果有多个网卡,将public network也写入ceph.conf文件的[global]字段中
public_network = 193.168.123.0/24
5. 在各个节点上安装ceph
$ ceph-deploy install monitor1 osd1 osd2
问题:
Preparing to unpack .../ceph-base_10.2.1-1trusty_amd64.deb ...
Unpacking ceph-base (10.2.1-1trusty) ...
dpkg: error processing archive /var/cache/apt/archives/ceph-base_10.2.1-1trusty_amd64.deb (--unpack):
trying to overwrite '/usr/share/man/man8/ceph-deploy.8.gz', which is also in package ceph-deploy 1.4.0-0ubuntu1
Selecting previously unselected package ceph-fs-common.
Preparing to unpack .../ceph-fs-common_10.2.1-1trusty_amd64.deb ...
Unpacking ceph-fs-common (10.2.1-1trusty) ...
Selecting previously unselected package ceph-fuse.
Preparing to unpack .../ceph-fuse_10.2.1-1trusty_amd64.deb ...
Unpacking ceph-fuse (10.2.1-1trusty) ...
Selecting previously unselected package ceph-mds.
Preparing to unpack .../ceph-mds_10.2.1-1trusty_amd64.deb ...
Unpacking ceph-mds (10.2.1-1trusty) ...
Processing triggers for ureadahead (0.100.0-16) ...
Processing triggers for man-db (2.6.7.1-1) ...
Errors were encountered while processing:
/var/cache/apt/archives/ceph-base_10.2.1-1trusty_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)
原因:
之前安装ceph-base包出现问题。
解决办法:
$ dpkg -i --force-overwrite /var/cache/apt/archives/ceph-base_10.2.1-1trusty_amd64.deb因为之前安装ceph-base包出现问题,现在只能手动下载ceph-base安装包并强制覆盖安装。
6. 初始化monitor节点
$ ceph-deploy mon create-initial
3. 配置osd节点(管理节点)
————————————————————————————————————————————————1. 为osd守护进程创建所需的磁盘。sdb最为OSD守护进程磁盘,sda作为日志磁盘。
$ ssh osd1 $ sudo mkfs.xfs /dev/sda -f $ mkfs.xfs /dev/sdb -f $ exit $ ssh osd2 $ sudo mkfs.xfs /dev/sda -f $ mkfs.xfs /dev/sdb -f $ exit
2. 擦净磁盘,比如分区表等。
$ ceph-deploy disk zap osd1:sda $ ceph-deploy disk zap osd1:sdb $ ceph-deploy disk zap osd2:sda $ ceph-deploy disk zap osd2:sdb
3. 准备osd节点
$ ceph-deploy osd prepare osd1:sdb:/dev/sda $ ceph-deploy osd prepare osd2:sdb:/dev/sda
4. 激活osd节点
$ ceph-deploy osd activate osd1:/dev/sdb1:/dev/sda1 $ ceph-deploy osd activate osd2:/dev/sdb1:/dev/sda1
5. 将配置文件和admin密钥拷贝到ceph所有节点
$ ceph-deploy admin bees2 monitor1 osd1 osd2
问题:
[ceph_deploy.admin][ERROR ] RuntimeError: config file /etc/ceph/ceph.conf exists with different conte use --overwrite-conf to overwrite
原因:
卸载ceph之后并没有删除管理节点的ceph配置文件,新生成的ceph配置文件和之前的出现差异。
解决办法:
$ ceph-deploy --overwrite-conf admin bees2 monitor1 osd1 osd2在卸载ceph后,没有删除管理节点的ceph配置,导致新生成的文件和以前的文件内容有所不同。强制覆盖。
6. 添加对ceph.client.admin.keyring 有正确的操作权限
$ sudo chmod +r /etc/ceph/ceph.client.admin.keyring
7. 检查集群的健康情况,集群应该是active clean状态
$ ceph health HEALTH_OK $ ceph -s cluster 54356b3d-be17-4d5c-a8b0-804420caa59d health HEALTH_OK monmap e1: 1 mons at {monitor1=193.168.123.58:6789/0} election epoch 3, quorum 0 monitor1 osdmap e10: 2 osds: 2 up, 2 in flags sortbitwise pgmap v23: 64 pgs, 1 pools, 0 bytes data, 0 objects 68380 kB used, 20391 MB / 20457 MB avail 64 active clean
4. 问题一览
————————————————————————————————————————————————以下问题是发生在
1)使用root用户配置ceph集群。
2)osd守护进程使用ext4格式的磁盘。
的情况。
问题 1
安装好虚拟机之后,设置桥接方式。发现主机A中的虚拟机ping不通主机B。主机B中的虚拟机ping不通主机A。但是主机A和主机是可以相互ping通。主机A —————————————— 主机B (可以)
主机A中的虚拟机 ————————— 主机B (不可以)
主机A —————————————— 主机B中的虚拟机 (不可以)
*原因:
公司网络限制。
解决办法
使用公司白名单上的MAC地址。
问题 2
在使用apt-get更新源的时候,出现如下问题。root@monitor1:/etc/apt# apt-get update E: Method http has died unexpectedly! E: Sub-process http received signal 6. root@monitor1:/etc/apt#原因:
公司网络限制。
解决办法
使用能够访问外网的MAC地址。
问题 3
使用目录作为osd守护进程。当activate osd设备的时候出现如下错误。[osd1][WARNIN] 2016-05-22 16:02:20.403039 7f859771e800 -1 asok(0x7f85a1ffc280) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-osd.0.asok': (13) Permission denied [osd1][WARNIN] 2016-05-22 16:02:20.403601 7f859771e800 -1 filestore(/var/local/osd1) mkfs: write_version_stamp() failed: (13) Permission denied [osd1][WARNIN] 2016-05-22 16:02:20.403630 7f859771e800 -1 OSD::mkfs: ObjectStore::mkfs failed with error -13 [osd1][WARNIN] 2016-05-22 16:02:20.403682 7f859771e800 -1 ** ERROR: error creating empty object store in /var/local/osd1: (13) Permission denied [osd1][WARNIN] Traceback (most recent call last): [osd1][WARNIN] File "/usr/sbin/ceph-disk", line 9, in <module> [osd1][WARNIN] load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')() [osd1][WARNIN] File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4964, in run [osd1][WARNIN] main(sys.argv[1:]) [osd1][WARNIN] File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4915, in main [osd1][WARNIN] args.func(args) [osd1][WARNIN] File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3277, in main_activate [osd1][WARNIN] init=args.mark_init, [osd1][WARNIN] File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3097, in activate_dir [osd1][WARNIN] (osd_id, cluster) = activate(path, activate_key_template, init) [osd1][WARNIN] File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3202, in activate [osd1][WARNIN] keyring=keyring, [osd1][WARNIN] File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 2695, in mkfs [osd1][WARNIN] '--setgroup', get_ceph_group(), [osd1][WARNIN] File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 439, in command_check_call [osd1][WARNIN] return subprocess.check_call(arguments) [osd1][WARNIN] File "/usr/lib/python2.7/subprocess.py", line 540, in check_call [osd1][WARNIN] raise CalledProcessError(retcode, cmd) [osd1][WARNIN] subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd', '--cluster', 'ceph', '--mkfs', '--mkkey', '-i', '0', '--monmap', '/var/local/osd1/activate.monmap', '--osd-data', '/var/local/osd1', '--osd-journal', '/var/local/osd1/journal', '--osd-uuid', 'cb9d8962-75f7-4cb1-8a99-ca8044ee283f', '--keyring', '/var/local/osd1/keyring', '--setuser', 'ceph', '--setgroup', 'ceph']' returned non-zero exit status 1 [osd1][ERROR ] RuntimeError: command returned non-zero exit status: 1 [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: /usr/sbin/ceph-disk -v activate --mark-init upstart --mount /var/local/osd1原因:
对/var/local/osd1没有相关权限。
解决办法:
给/var/local/osd1添加所有权限。
root@osd1:/home/bees# chmod 777 /var/local/osd1
问题 4
ceph_disk.main.Error: Error: another ceph osd.0 already mounted in position (old/different cluster instance?); unmounting ours.原因:
在ceph节点上,/var/lib/ceph/osd/目录下的某个osd进程正在使用这个磁盘。
解决办法:
1. 换一个磁盘或者目录。如果还是出现此问题,使用方法2。
2. 删除/var/lib/ceph/osd/目录下使用此磁盘的osd。
如果主机上有多个osd守护进程,注意不要删错了。
问题 5
在查看ceph集群状态的时候,出现如下问题root@bees2:/home/my-cluster# ceph health HEALTH_ERR 64 pgs are stuck inactive for more than 300 seconds; 64 pgs stuck inactive原因:
因为本次osd守护进程所在磁盘格式为ext4。
解决办法:
1. 重新添加一块磁盘,推荐格式化为xfs。
2. 在osd字段下添加 filestore xattr use omap = true。方法2暂未尝试。
问题 6
root@bees2:/home/my-cluster# ceph -s cluster 15e780dc-f32c-47f8-8105-54a45aaa167d health HEALTH_ERR 2 pgs are stuck inactive for more than 300 seconds 62 pgs degraded 64 pgs stale 2 pgs stuck stale 62 pgs stuck unclean 62 pgs undersized monmap e1: 1 mons at {monitor1=193.168.123.58:6789/0} election epoch 9, quorum 0 monitor1 osdmap e491: 2 osds: 2 up, 2 in; 62 remapped pgs flags sortbitwise pgmap v2421: 64 pgs, 1 pools, 0 bytes data, 0 objects 79208 kB used, 30620 MB / 30697 MB avail 62 stale active undersized degraded 2 stale active clean原因:
暂不清楚。
解决办法
卸载ceph并清除配置,并重新安装ceph。给出两点建议
1. 使用普通用户执行ceph-deploy。
2. 最好不要使用ext4的磁盘,推荐使用xfs。
相关文章推荐
- ubuntu14.04LTS ceph快速安装
- ubuntu 16.04快速安装ceph集群
- Ceph快速安装:Ceph-deploy 在Ubuntu 14.04 Server上部署三节点安装
- Ceph快速安装:Ceph-deploy 在Ubuntu 14.04 Server上部署三节点安装
- Ubuntu快速安装jdk的教程
- Ubuntu 服务器管理员手记(二):快速安装配置邮件服务器(详解 Postfix + Dovecot)
- Ubuntu16.04安装RabbitMQ(快速安装)
- foreman ubuntu16 快速安装
- Linux Ubuntu 上快速安装 Ruby 2.2
- 在Ubuntu上快速安装MySQL,远程连接云服务器上安装的mysql
- Ubuntu下快速安装jdk、tomcat、mysql和Redis
- hadoop快速入门(Ubuntu安装方法)
- 最快速在ubuntu下安装Oracle(Sun) JDK的方法
- wubi安装Ubuntu 重做XP系统或ghost还原后快速恢复引
- Ubuntu16.04下OpenCV的快速安装和Python下的使用
- Ubuntu 14.04快速安装ycm
- Ubuntu Server系统下快速安装LAMP生产环境脚本
- Ubuntu 14.04 安装Apache、MySQL、PHP、JDK7、Tomcat7、vsFTPd、Open SSH Server快速步骤
- 简易快速安装Ubuntu
- SUMO仿真快速入门系列四:Ubuntu 12.04 安装ns2.35