您的位置:首页 > 其它

安装部署(七) HBase集群安装部署与测试

2016-08-12 12:09 507 查看
HBase集群安装部署与测试

Hadoop 2.7.2 

Spark 2.0.0

Kafka 0.10.0.0

HBase 1.2.2

Zookeeper 3.4.8

参考:
http://www.tuicool.com/articles/VV7bam http://blog.csdn.net/yinedent/article/details/48275407
1 下载:
http://mirrors.hust.edu.cn/apache/hbase/stable/ http://mirrors.hust.edu.cn/apache/hbase/stable/hbase-1.2.2-bin.tar.gz
2 解压:

root@py-server:/server# tar xvzf hbase-1.2.2-bin.tar.gz

root@py-server:/server# mv hbase-1.2.2/ hbase

3 环境变量:

vi ~/.bashrc

export HBASE_HOME=/server/hbase

export PATH=$HBASE_HOME/bin

source ~/.bashrc

4 配置:

依赖zookeeper环境,zookeeper集群参考spark安装部署里边的相应内容。主备Master。

5台机配置如下:

4.1 配置hbase-site.xml

vi $HBASE_HOME/conf/hbase-site.xml

内容如下:

<property>

    <name>hbase.rootdir</name>

    <value>hdfs://py-server:9000/hbase</value>

  </property>

  <property>

    <name>hbase.cluster.distributed</name>

    <value>true</value>

  </property>

    <property>

<name>hbase.zookeeper.quorum</name>

<value>py-server:2181,py-11:2181,py-12:2181,py-13:2181,py-14:2181</value>

</property>

注:hdfs端口号在root@py-server:/server/hadoop/etc/hadoop# vi core-site.xml里查看,之前配的是9000

4.2 配置主备Master

配置备用Master,把备用的Master主机名写入,backup-masters配置文件默认不存在,新建一个

vi $HBASE_HOME/conf/backup-masters

py-12

4.3 配置regionservers

py-server

py-11

py-12

py-13

py-14

4.4配置hbase-env.sh

修改堆内存,不用HBase管理zookeeper集群。

vi $HBASE_HOME/conf/hbase-env.sh

export HBASE_HEAPSIZE=4G

export HBASE_MANAGES_ZK=false

5 分发

root@py-server:/server# scp -r hbase/ root@10.1.1.11:/server/

root@py-server:/server# scp -r hbase/ root@10.1.1.12:/server/

root@py-server:/server# scp -r hbase/ root@10.1.1.13:/server/

root@py-server:/server# scp -r hbase/ root@10.1.1.14:/server/

修改各自的环境变量11-14节点 ~/.bashrc

6 启动集群

6.1

在主节点py-server执行:

root@py-server:/server# $HBASE_HOME/bin/start-hbase.sh

结果:

starting master, logging to /server/hbase/logs/hbase-root-master-py-server.out

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0

py-11: starting regionserver, logging to /server/hbase/bin/../logs/hbase-root-regionserver-py-11.out

py-14: starting regionserver, logging to /server/hbase/bin/../logs/hbase-root-regionserver-py-14.out

py-13: starting regionserver, logging to /server/hbase/bin/../logs/hbase-root-regionserver-py-13.out

py-12: starting regionserver, logging to /server/hbase/bin/../logs/hbase-root-regionserver-py-12.out

py-11: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0

py-11: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0

py-13: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0

py-13: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0

py-12: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0

py-12: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0

py-12: SLF4J: Class path contains multiple SLF4J bindings.

py-12: SLF4J: Found binding in [jar:file:/server/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

py-12: SLF4J: Found binding in [jar:file:/server/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

py-12: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

py-12: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

py-server: starting regionserver, logging to /server/hbase/logs/hbase-root-regionserver-py-server.out

py-server: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0

py-server: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0

py-12: starting master, logging to /server/hbase/bin/../logs/hbase-root-master-py-12.out

py-12: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0

py-12: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0

6.2验证

6.2.1 验证1:

通过Web UI查看即可。默认地址 http://py-server:16010/master-status
注意host是Master

Region Servers

Base Stats 

Memory 

Requests 

Storefiles 

Compactions 

ServerName Start time Version Requests Per Second Num. Regions 

py-11,16020,1470969413562  Fri Aug 12 10:36:53 CST 2016 Unknown 0 0 

py-12,16020,1470969406816  Fri Aug 12 10:36:46 CST 2016 Unknown 0 0 

py-13,16020,1470969425459  Fri Aug 12 10:37:05 CST 2016 Unknown 0 0 

py-14,16020,1470969407402  Fri Aug 12 10:36:47 CST 2016 Unknown 0 0 

py-server,16020,1470969419382  Fri Aug 12 10:36:59 CST 2016 Unknown 0 0 

Total:5  5 nodes with inconsistent version 0 0 

6.2.2 验证2:

这个是Master

root@py-server:~# jps

18592 NodeManager

17894 DataNode

29959 Jps

18867 Worker

9780 Kafka

25303 Main

29623 HMaster

18073 SecondaryNameNode

18650 Master

20218 jar

17499 QuorumPeerMain

18269 ResourceManager

17725 NameNode

root@py-11:~# jps

23158 Worker

22664 QuorumPeerMain

24105 Jps

23898 HRegionServer

22971 NodeManager

22828 DataNode

20189 Kafka

root@py-12:~# jps

3846 QuorumPeerMain

13256 Jps

5161 Kafka

4282 Worker

13035 HMaster

4011 DataNode

4156 NodeManager

root@py-13:~# jps

20960 NodeManager

20817 DataNode

21126 Worker

17287 Kafka

20188 Jps

20653 QuorumPeerMain

19983 HRegionServer

root@py-14:~# jps

15958 Jps

551 DataNode

15753 HRegionServer

701 NodeManager

911 Worker

12095 ZooKeeperMain

383 QuorumPeerMain

6.2.3 验证3:

$ZOOKEEPER_HOME/bin/zkCli.sh

[zk: localhost:2181(CONNECTED) 0] ls /

[controller_epoch, controller, brokers, zookeeper, admin, isr_change_notification, consumers, config, hbase]

包含hbase

6.2.4 验证4:hbase shell 

root@py-server:~# hbase shell

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/server/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/server/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 1.2.2, r3f671c1ead70d249ea4598f1bbcc5151322b3a13, Fri Jul  1 08:28:55 CDT 2016

hbase(main):002:0> create 'test','cf'

0 row(s) in 2.4050 seconds

=> Hbase::Table - test

hbase(main):003:0> list

TABLE                                                                                                                                                                               

test                                                                                                                                                                                

1 row(s) in 0.0070 seconds

=> ["test"]

hbase(main):004:0> version

1.2.2, r3f671c1ead70d249ea4598f1bbcc5151322b3a13, Fri Jul  1 08:28:55 CDT 2016

hbase(main):005:0> status

1 active master, 1 backup masters, 5 servers, 0 dead, 0.6000 average load

hbase(main):006:0> put 'test','rowkey1','cf:id','1'

0 row(s) in 0.1390 seconds

hbase(main):007:0> put 'test','rowkey1','cf:name','zhang3'

0 row(s) in 0.0150 seconds

hbase(main):008:0> scan 'test'

ROW                                            COLUMN+CELL                                                                                                                          

 rowkey1                                       column=cf:id, timestamp=1470974366010, value=1                                                                                       

 rowkey1                                       column=cf:name, timestamp=1470974389934, value=zhang3                                                                                

1 row(s) in 0.0470 seconds

hbase(main):009:0> 

6.2.5 验证HMaster自动切换【主备】

参照:
http://blog.csdn.net/yinedent/article/details/48275407
或者本文附录

6.3 关闭

6.3.1主节点执行:

root@py-12:~# stop-hbase.sh

6.3.2 kill进程(不推荐,除非关不掉hbase)

方法一:

root@py-server:~# jps

18592 NodeManager

17894 DataNode

31852 Jps

31089 HMaster

18867 Worker

9780 Kafka

25303 Main

18073 SecondaryNameNode

18650 Master

20218 jar

17499 QuorumPeerMain

30045 Main

18269 ResourceManager

17725 NameNode

31263 HRegionServer

kill -9 31089

方法二:

root@py-server:~# netstat -nlp | grep java

找到hbase端口16000对应的进程id

tcp6       0      0 10.1.1.6:16000          :::*                    LISTEN      31089/java      

kill -9 31089

##########################################

常见问题收集:

问题1:

出现ERROR: Can't get master address from ZooKeeper; znode data == null解决办法

出现此问题可能是zookeeper不稳定造成的,采用的是虚拟机,经常挂起的状态,使用hbase的list命令出现下面错误,这个可能是hbase的稳定性造成的,解决办法有两种。这里使用第一种办法就解决了。

ERROR: Can't get master address from ZooKeeper; znode data == null

Here is some help for this command:

List all tables in hbase. Optional regular expression parameter could

be used to filter the output. Examples:

  hbase> list

  hbase> list 'abc.*'

解决方法:

1.重启hbase

stop-hbase.sh

然后

start-hbase.sh

复制代码

问题解决。这里也找了其他解决办法,作为一个整理。

2.解决方法2:格式化namenode

2节点的datanode 日志信息中:

Incompatible namespaceIDs in /home/hadoop/tmp/dfs/data: namenode namespaceID = 1780037790

1节点的namenode日志信息::java.io.IOException: File /home/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1

将namenode的信息删除,重新格式化

重新启动,hbase正常

原文地址:
http://www.aboutyun.com/thread-8691-1-1.html
3.解决方法3:hdfs端口配置不一致

这个问题这里是由于启动hbase失败造成,主要原因是因为配置文件hbase-site.xml中hdfs端口配置错误导致,帮助文档中使用的是8020,而我的hadoop分布集群用的是9000默认端口,修改配置如下:

gedit hbase-site.xml

                <property>

                        <name>hbase.rootdir</name>

                        <value>hdfs://hadoop0:9000/hbase</value>

                </property>

配置好后分发到各个节点,保证各个节点配置文件一致,重启节点中各台机器。启动Hbase,bin/start-hbase.sh

        启动顺序:hadoop-->zookeeper-->hbase 

        在hadoop0上启动hadoop集群:

        /home/hadoop/hadoop-2.6.0/sbin/start-all.sh

        在每一台机器上启动zookeeper:

        /home/hadoop/zookeeper-3.4.6/bin/zkServer.sh start

        

        在hadoop0上启动hbase集群:

         /home/hadoop/hbase-1.0.1.1/bin/start-hbase.sh

登录http://hadoop0:16010/master-status查看状态,页面这回可以打开了。

最后测试list工作正常,问题解决。

原文地址:
http://f.dataguru.cn/thread-519459-1-1.html
问题2:hbase停不了,stop-hbase.sh无法停止hbase
http://www.cnblogs.com/jdksummer/articles/2506811.html
停止hbase--------------注意停止顺序,先停止hbase,再停止hadoop,否则可能导致hbase停止不了。

 $stop-hbase.sh

 $stop-all.sh

碰到的各种问题:

官方文档里说明了HBase与Hadoop的适配版本,主要是 0.20-append 分支是否合并进hadoop主干的问题。hadoop 0.20.205.0 已经合并,所以一定要用 0.20.205.0 以后版本的hadoop。

官方文档里提到的需要替换 lib/hadoop-core-….jar 的问题一定要做。否则启动时会出现 EOFException。由于实际版本号不同,所以直接把原jar移走,新jar放入即可。

0.20.205.0 需要同时把 hadoop/lib 里的commons-configuration-1.6.jar 也考到 hbase/lib里。否则启动master时会出现master.HMaster exception of “NoClassDefFoundError” 

bin/start-hbase 会自动启动一个zookeeper。当然可以自行配置zookeeper。

bin/stop-hbase 貌似只会停止zookeeper和master, 在 B(master)上会遗留 regionserver,可以kill 进程或者 bin/hbase-daemon.sh stop regionserver

同样使用 netstat -nlp | grep java 检查端口号,HBase相关服务端口均以 600开头。

如果因为停止顺序不对导致hbase停止不了,可以通过下列方法强行kill掉hbase守护进程来停止,

        $netstat -nlp | grep java---------------------------------查找守护进程(Hmaster端口号为60000注意,本例是16000)对应的进程号,如下图所示 查找到进程号1 5562

        $sudo kill -9 15562 -----------------------------------------杀死进程 

  或者直接使用jps,每行中的数字即为进程号,直接使用上面命令kill掉即可。

问题3:启动HBase的时候,无法启动RegionServer,查看日志,错误如下: 

  org.apache.hadoop.hbase.ClockOutOfSyncException: Server summer1,60020,1357384944077 has been rejected; Reported time is too far out of sync with master.  Time difference of 3549899ms > max allowed of 30000ms

   原因是RegionServer与Master的时间不一致造成的,由错误内容可以看出两台机器之间最大的误差时间为30000ms,一旦超过这个值便无法启动。

  解决方法:在所有结点上执行:sudo ntpdate time.nist.gov   (time.nist.gov是一个时间服务器)
http://www.cnblogs.com/jdksummer/articles/2506811.html
问题4;hbase运行shell时ERROR:org.apache.hadoop.hbase.PleaseHoldException: Master is initializing 的解决办法

参考:
http://www.cnblogs.com/suddoo/p/4986094.html
所有机执行时间同步

root@py-server:~# ntpdate time.nist.gov

12 Aug 11:50:40 ntpdate[30326]: no server suitable for synchronization found

root@py-server:~# ntpdate 0.cn.pool.ntp.org

12 Aug 11:51:02 ntpdate[30361]: step time server 120.25.108.11 offset -10.398102 sec

root@py-server:~# 之后就是关闭hbase,再重新hbase,进入hbase shell ,list一下正常了。

参考我给出的第二篇博客的连接,安装ntpdate, sudo apt-get install ntpdate后,运行shell命令:ntpdate  0.cn.pool.ntp.org    这个命令很简单,参数可以选择任意一个时间服务器的地址,然后重启hbase数据库:bin/stop-hbase.sh     bin/start-hbase.sh  即可。可能会出现 can't get master address from ZooKeeper错误,这可能是由于ZooKeeper不稳定造成的,我试着又重启了一下,就可以了。

#####################################

验证HMaster自动切换

在hadoop60上的日志查看

cloud@hadoop60:~> tail/home/cloud/hbase0962/logs/hbase-cloud-master-hadoop60.log

2015-06-02 14:30:39,705 INFO [master:hadoop60:60000] http.HttpServer: Added global filter 'safety'(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)

2015-06-02 14:30:39,710 INFO [master:hadoop60:60000] http.HttpServer: Added filter static_user_filter(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) tocontext master

2015-06-02 14:30:39,711 INFO [master:hadoop60:60000] http.HttpServer: Added filter static_user_filter(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) tocontext static

2015-06-02 14:30:39,731 INFO [master:hadoop60:60000] http.HttpServer: Jetty bound to port 60010

2015-06-02 14:30:39,731 INFO [master:hadoop60:60000] mortbay.log: jetty-6.1.26

2015-06-02 14:30:40,291 INFO [master:hadoop60:60000] mortbay.log: StartedSelectChannelConnector@0.0.0.0:60010

2015-06-02 14:30:40,292 DEBUG [master:hadoop60:60000]master.HMaster: HMaster started in backup mode. Stalling until master znode is written.

2015-06-02 14:30:40,407 INFO [master:hadoop60:60000] zookeeper.RecoverableZooKeeper: Node/hbase/master already exists and this is not a retry

2015-06-02 14:30:40,408 INFO  [master:hadoop60:60000] master.ActiveMasterManager:Adding ZNode for /hbase/backup-masters/hadoop60,60000,1433226638688 in backupmaster directory

2015-06-02 14:30:40,421 INFO  [master:hadoop60:60000]master.ActiveMasterManager: Another master is the active master,hadoop59,60000,1433226634553; waiting to become the next active master

 

这里说明zookeeper已经接管了,并且把hadoop60作为一个备份的Hbase了,并且这里提示waiting to become thenextactive master(等待变成下一个活动的master),然后我们可以将hadoop59上的hmaster进程给kill掉,当然,也可以使用 ./hbase-daemon.shstop master 来结束hadoop59上的hmaster进程

6.14.2 kill掉hadoop59上的hmaster进程,看看hadoop60上的日志会有什么变化

cloud@hadoop59:~> hbase0962/bin/hbase-daemon.shstop master

stopping master.

cloud@hadoop59:~> jps

1320 Jps

45952 DFSZKFailoverController

49796 ResourceManager

43879 NameNode

cloud@hadoop59:~>

# 下面是hadoop60上日志变化后的信息

cloud@hadoop60:~>tail  -n 50/home/cloud/hbase0962/logs/hbase-cloud-master-hadoop60.log

(省略。。。。。。)

2015-06-0214:47:48,103 INFO [master:hadoop60:60000] master.RegionStates: Onlinedc6541fc62282f10ad4206d626cc10f8b on hadoop36,60020,1433226640106

2015-06-0214:47:48,105 DEBUG [master:hadoop60:60000] master.AssignmentManager: Found{ENCODED => c6541fc62282f10ad4206d626cc10f8b, NAME =>'test,,1433214676168.c6541fc62282f10ad4206d626cc10f8b.', STARTKEY => '',ENDKEY => ''} out on cluster

2015-06-02 14:47:48,105INFO  [master:hadoop60:60000]master.AssignmentManager: Found regions out on cluster or in RIT; presumingfailover

2015-06-0214:47:48,237 DEBUG [master:hadoop60:60000] hbase.ZKNamespaceManager: Updatingnamespace cache from node default with data: \x0A\x07default

2015-06-0214:47:48,241 DEBUG [master:hadoop60:60000] hbase.ZKNamespaceManager: Updatingnamespace cache from node hbase with data: \x0A\x05hbase

2015-06-0214:47:48,289 INFO [master:hadoop60:60000] zookeeper.RecoverableZooKeeper: Node /hbase/namespace/defaultalready exists and this is not a retry

2015-06-0214:47:48,308 INFO [master:hadoop60:60000] zookeeper.RecoverableZooKeeper: Node/hbase/namespace/hbase already exists and this is not a retry

2015-06-0214:47:48,318 INFO  [master:hadoop60:60000]master.HMaster: Master has completed initialization

只看红色标注的地方,意思就是说当我们kill掉hadoop59上的hmaster的时候,唤醒等待的hmaster线程,然后找到了等待的hmaster(hadoop60)),然后 zookeeper就接管并且将hadoop6上的hmaster从等待状态切换为激活状态了,然后就ok了。(当然也可以多开几个备用的hmaster,只需要在backup-masters配置文件中添加即可)

下面验证配用hbase是否可用

cloud@hadoop60:~> hbase shell

2015-06-02 14:51:36,014 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated.Instead, use io.native.lib.available

HBase Shell; enter 'help<RETURN>' for list of supportedcommands.

Type "exit<RETURN>" to leave the HBase Shell

Version 0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014

 

hbase(main):001:0> status

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in[jar:file:/home/cloud/hbase0962/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in[jar:file:/home/cloud/hadoop220/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for anexplanation.

2015-06-02 14:51:40,658 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library foryour platform... using builtin-java classes where applicable

8 servers, 0 dead, 0.3750 average load

hbase(main):002:0> scan 'test'

ROW                                              COLUMN+CELL                                                                                                                                  

 rowkey1                                        column=cf:id, timestamp=1433215397406, value=1                                                                                               

 rowkey1                                        column=cf:name, timestamp=1433215436532, value=zhangsan                                                                                       

1 row(s) in 0.2120 seconds

 

hbase(main):003:0>
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  hbase 集群