您的位置:首页 > Web前端 > HTML5

CDH5.0.2升级至CDH5.2.0

2016-07-13 17:28 447 查看
升级需求

1.为支持spark kerberos安全机制

2.为满足impala trunc函数

3.为解决impala import时同时query导致impala hang问题

升级步骤

参考http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/installation_upgrade.html

优先升级cloudera manager,再升级cdh

1.准备工作:

统一集群root密码,需要运维帮忙操作下

agent自动重启关闭

事先下载好parcals包

2.CM升级

登录cmserver安装的主机,执行命令:

cat /etc/cloudera-scm-server/db.properties

备份CM数据:

  pg_dump -U scm -p 7432   > scm_server_db_backup.bak

  检查/tmp下是否有文件生成,期间保证tmp下文件不要被删除。

停止CM server :

  sudo service cloudera-scm-server stop

停止CM server依赖的数据库:

  sudo service cloudera-scm-server-db stop

如果这台CM server上有agent在运行也停止:

  sudo service cloudera-scm-agent stop

修改yum的 cloudera-manager.repo文件:

  sudo vim /etc/yum.repos.d/cloudera-manager.repo

   [cloudera-manager]

       # Packages for Cloudera Manager, Version 5, on RedHat or CentOS 6 x86_64

       name=Cloudera Manager

       baseurl=http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5/

       gpgkey = http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/RPM-GPG-KEY-cloudera
       gpgcheck = 1

安装:

  sudo

      yum clean all

      sudo yum upgrade 'cloudera-*'

检查:

  rpm  -qa 'cloudera-manager-*'

启动CM server 数据库:

  sudo  service cloudera-scm-server-db start

启动CM server:

  sudo  service cloudera-scm-server start

登录http://172.20.0.83:7180/

  安装agent(步骤略)

升级如果升级jdk,会改变java_home路径,导致java相关服务不可用,需要重新配置java_home

升级CM后需要重启CDH。

3.CDH升级

停止集群所有服务

备份namenode元数据:

  进入namenode dir,执行:

   tar -cvf /root/nn_backup_data.tar ./*

下载parcels

分发包->激活包->关闭(非重启)

开启zk服务

进入HDFS服务->升级hdfs metadata

  namenode上启动元数据

  启动剩余HDFS角色

  namenode响应RPC

  HDFS退出安全模式

备份hive metastore数据库

  mysqldump -h172.20.0.67 -ucdhhive -p111111 cdhhive > /tmp/database-backup.sql

进入hive服务->更新hive metastore

     database scheme

更新oozie sharelib:oozie->install

     oozie share lib

  创建 oozie user

      sharelib

  创建 oozie user

      Dir

更新sqoop:进入sqoop服务->update

     sqoop

  更新sqoop2 server

更新spark(略,可先卸载原来版本,升级后直接安装新版本)

启动集群所有服务:zk->hdfs->spark->flume->hbase->hive->impala->oozie->sqoop2->hue

分发客户端文件:deploy client

     configuration

  deploy hdfs client configuration

  deploy spark client configuration

  deploy hbase client configuration

  deploy yarn client configuration

  deploy hive client configuration

删除老版本包:

  sudo  yum remove bigtop-utils bigtop-jsvc bigtop-tomcat hue-common sqoop2-client

启动agent:

  sudo service cloudera-scm-agent restart

HDFS

  metadata update

  hdfs server->instance->namenode=>action->Finalize

      Metadata Upgrade

升级过程遇主要问题:

com.cloudera.server.cmf.FeatureUnavailableException: The feature Navigator Audit Server is not available.

        at com.cloudera.server.cmf.components.LicensedFeatureManager.check(LicensedFeatureManager.java:49)

        at com.cloudera.server.cmf.components.OperationsManagerImpl.setConfig(OperationsManagerImpl.java:1312)

        at com.cloudera.server.cmf.components.OperationsManagerImpl.setConfigUnsafe(OperationsManagerImpl.java:1352)

        at com.cloudera.api.dao.impl.ManagerDaoBase.updateConfigs(ManagerDaoBase.java:264)

        at com.cloudera.api.dao.impl.RoleConfigGroupManagerDaoImpl.updateConfigsHelper(RoleConfigGroupManagerDaoImpl.java:214)

        at com.cloudera.api.dao.impl.RoleConfigGroupManagerDaoImpl.updateRoleConfigGroup(RoleConfigGroupManagerDaoImpl.java:97)

        at com.cloudera.api.dao.impl.RoleConfigGroupManagerDaoImpl.updateRoleConfigGroup(RoleConfigGroupManagerDaoImpl.java:79)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:606)

        at com.cloudera.api.dao.impl.ManagerDaoBase.invoke(ManagerDaoBase.java:208)

        at com.sun.proxy.$Proxy82.updateRoleConfigGroup(Unknown Source)

        at com.cloudera.api.v3.impl.RoleConfigGroupsResourceImpl.updateRoleConfigGroup(RoleConfigGroupsResourceImpl.java:69)

        at com.cloudera.api.v3.impl.MgmtServiceResourceV3Impl$RoleConfigGroupsResourceWrapper.updateRoleConfigGroup(MgmtServiceResourceV3Impl.java:54)

        at com.cloudera.cmf.service.upgrade.RemoveBetaFromRCG.upgrade(RemoveBetaFromRCG.java:80)

        at com.cloudera.cmf.service.upgrade.AbstractApiAutoUpgradeHandler.upgrade(AbstractApiAutoUpgradeHandler.java:36)

        at com.cloudera.cmf.service.upgrade.AutoUpgradeHandlerRegistry.performAutoUpgradesForOneVersion(AutoUpgradeHandlerRegistry.java:233)

        at com.cloudera.cmf.service.upgrade.AutoUpgradeHandlerRegistry.performAutoUpgrades(AutoUpgradeHandlerRegistry.java:167)

        at com.cloudera.cmf.service.upgrade.AutoUpgradeHandlerRegistry.performAutoUpgrades(AutoUpgradeHandlerRegistry.java:138)

        at com.cloudera.server.cmf.Main.run(Main.java:587)

        at com.cloudera.server.cmf.Main.main(Main.java:198)

2014-11-26 03:17:42,891 INFO ParcelUpdateService:com.cloudera.parcel.components.ParcelDownloade

原先版本使用了60天试用企业版本,该期限已经过期,升级时Navigator服务启动不了,导致整个cloduera manager server启动失败

升级后问题

a.升级后flume原先提供的第三方jar丢失,需要将包重新放在/opt....下

b.sqoop导入mysql的驱动包找不到,需要将包重新放在/opt....下

c.hbase服务异常

Unhandled exception. Starting shutdown.

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User hbase/ip-10-1-33-20.ec2.internal@YEAHMOBI.COM (auth:KERBEROS) is not authorized for protocol interface org.apache.hadoop.hdfs.protocol.ClientProtocol, expected client Kerberos principal is null

at org.apache.hadoop.ipc.Client.call(Client.java:1409)

at org.apache.hadoop.ipc.Client.call(Client.java:1362)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)

at com.sun.proxy.$Proxy15.setSafeMode(Unknown Source)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)

at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)

at com.sun.proxy.$Proxy15.setSafeMode(Unknown Source)

at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setSafeMode(ClientNamenodeProtocolTranslatorPB.java:594)

at org.apache.hadoop.hdfs.DFSClient.setSafeMode(DFSClient.java:2224)

at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:993)

at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:977)

at org.apache.hadoop.hbase.util.FSUtils.isInSafeMode(FSUtils.java:432)

at org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:851)

at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:435)

at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:146)

at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:127)

at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:789)

at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:606)

at java.lang.Thread.run(Thread.java:744)

通过cm将safe配置文件里的hbase.rpc.engine org.apache.hadoop.hbase.ipc.SecureRpcEngine去掉后重启成功。

后来发现是cm server的问题,之前修改了一个hostname,cloudera manager server未重启,重启后,加入该配置重启hbase不会有问题。

d.service monitor,zookeeper也有警告,其他服务都有部分红色异常

Exception in scheduled runnable.

java.lang.IllegalStateException

at com.google.common.base.Preconditions.checkState(Preconditions.java:133)

at com.cloudera.cmon.firehose.polling.CdhTask.checkClientConfigs(CdhTask.java:712)

at com.cloudera.cmon.firehose.polling.CdhTask.updateCacheIfNeeded(CdhTask.java:675)

at com.cloudera.cmon.firehose.polling.FirehoseServicesPoller.getDescriptorAndHandleChanges(FirehoseServicesPoller.java:615)

at com.cloudera.cmon.firehose.polling.FirehoseServicesPoller.run(FirehoseServicesPoller.java:179)

at com.cloudera.enterprise.PeriodicEnterpriseService$UnexceptionablePeriodicRunnable.run(PeriodicEnterpriseService.java:67)

at java.lang.Thread.run(Thread.java:745)

后来发现是cm server的问题,之前修改了一个hostname,cloudera manager server未重启,重启后,加入该配置重启hbase不会有问题。

e.mapreduce访问安全机制下的hbase失败

去除client hbase-site safe配置文件内容:hbase.rpc.protection privacy,旧版本中必须加此配置,而新版本文档中也提到需要加此配置,但经过测试加此配置后报如上异常。

14/11/27 12:38:26 INFO zookeeper.ClientCnxn: Socket connection established to ip-10-1-33-24.ec2.internal/10.1.33.24:2181, initiating session

14/11/27 12:38:26 INFO zookeeper.ClientCnxn: Session establishment complete on server ip-10-1-33-24.ec2.internal/10.1.33.24:2181, sessionid = 0x549ef6088f20309, negotiated timeout = 60000

14/11/27 12:38:41 WARN ipc.RpcClient: Couldn't setup connection for hbase/ip-10-1-10-15.ec2.internal@YEAHMOBI.COM to hbase/ip-10-1-34-31.ec2.internal@YEAHMOBI.COM

14/11/27 12:38:55 WARN ipc.RpcClient: Couldn't setup connection for hbase/ip-10-1-10-15.ec2.internal@YEAHMOBI.COM to hbase/ip-10-1-34-31.ec2.internal@YEAHMOBI.COM

14/11/27 12:39:15 WARN ipc.RpcClient: Couldn't setup connection for hbase/ip-10-1-10-15.ec2.internal@YEAHMOBI.COM to hbase/ip-10-1-34-31.ec2.internal@YEAHMOBI.COM

14/11/27 12:39:34 WARN ipc.RpcClient: Couldn't setup connection for hbase/ip-10-1-10-15.ec2.internal@YEAHMOBI.COM to hbase/ip-10-1-34-31.ec2.internal@YEAHMOBI.COM

14/11/27 12:39:55 WARN ipc.RpcClient: Couldn't setup connection for hbase/ip-10-1-10-15.ec2.internal@YEAHMOBI.COM to hbase/ip-10-1-34-31.ec2.internal@YEAHMOBI.COM

14/11/27 12:40:19 WARN ipc.RpcClient: Couldn't setup connection for hbase/ip-10-1-10-15.ec2.internal@YEAHMOBI.COM to hbase/ip-10-1-34-31.ec2.internal@YEAHMOBI.COM

14/11/27 12:40:36 WARN ipc.RpcClient: Couldn't setup connection for hbase/ip-10-1-10-15.ec2.internal@YEAHMOBI.COM to hbase/ip-10-1-34-31.ec2.internal@YEAHMOBI.COM

Caused by: java.io.IOException: Couldn't setup connection for hbase/ip-10-1-33-20.ec2.internal@YEAHMOBI.COM to hbase/ip-10-1-34-32.ec2.internal@YEAHMOBI.COM

at org.apache.hadoop.hbase.ipc.RpcClient$Connection$1.run(RpcClient.java:821)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)

at org.apache.hadoop.hbase.ipc.RpcClient$Connection.handleSaslConnectionFailure(RpcClient.java:796)

at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:898)

at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)

at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)

at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)

at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)

at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.execService(ClientProtos.java:30014)

at org.apache.hadoop.hbase.protobuf.ProtobufUtil.execService(ProtobufUtil.java:1623)

at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:93)

at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:90)

at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)

... 31 more

Caused by: javax.security.sasl.SaslException: No common protection layer between client and server

at com.sun.security.sasl.gsskerb.GssKrb5Client.doFinalHandshake(GssKrb5Client.java:252)

at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:187)

at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:210)

at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:770)

at org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$600(RpcClient.java:357)

at org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:891)

at org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:888)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)

at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:888)

... 40 more

<property>

     <name>hbase.rpc.engine</name>

     <value>org.apache.hadoop.hbase.ipc.SecureRpcEngine</value>

</property>

mr中使用http://www.cloudera.com/content/cloudera/en/documentation/cdh5/v5-0-0/CDH5-Installation-Guide/cdh5ig_mapreduce_hbase.html  TableMapReduceUtil.addDependencyJars(job);方式加载。

并且使用user api加入例如:

hbase.master.kerberos.principal=hbase/ip-10-1-10-15.ec2.internal@YEAHMOBI.COM

hbase.keytab.path=/home/dev/1015q.keytab

f.升级后impala jdbc安全机制下不可用

java.sql.SQLException: Could not open connection to jdbc:hive2://ip-10-1-33-22.ec2.internal:21050/ym_system;principal=impala/ip-10-1-33-22.ec2.internal@YEAHMOBI.COM: GSS initiate failed

at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:187)

at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:164)

at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)

at java.sql.DriverManager.getConnection(DriverManager.java:571)

at java.sql.DriverManager.getConnection(DriverManager.java:233)

at com.cloudera.example.ClouderaImpalaJdbcExample.main(ClouderaImpalaJdbcExample.java:37)

Caused by: org.apache.thrift.transport.TTransportException: GSS initiate failed

at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:221)

at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:297)

at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)

at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)

at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)

at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)

at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:185)

... 5 more

解决:

hadoop-auth-2.5.0-cdh5.2.0.jar

hive-shims-common-secure-0.13.1-cdh5.2.0.jar

两个包回退版本即可
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: