hadoop原生版安装部署---3.hdfs
2017-02-15 15:46
459 查看
版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/jameswangcnbj/article/details/55210675
1.下载安装
tar xzvf hadoop-2.2.0.tar.gz -C ../ mv hadoop-2.2.0/ hadoop/
2.bash_profile
su - hadoop export HADOOP_PREFIX="/home/hadoop/hadoop" export HADOOP_MAPRED_HOME=$HADOOP_PREFIX export HADOOP_COMMON_HOME=$HADOOP_PREFIX export HADOOP_HOME=$HADOOP_PREFIX export HADOOP_HDFS_HOME=$HADOOP_PREFIX export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop export YARN_CONF_DIR=$HADOOP_PREFIX/etc/hadoop export YARN_HOME=$HADOOP_PREFIX export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin source ~/.bash_profile
3.hadoop-env.sh
vi /home/hadoop/hadoop/etc/hadoop/hadoop-env.sh
# The java implementation to use. export JAVA_HOME=/usr/local/jdk1.6.0_45 export HADOOP_HOME="/home/hadoop/hadoop" export JAVA_LIBRARY_PATH="/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native" #测试机器8G 这里的XMS与XMX指的是JAVA虚拟机内存分配策略中的最小可用内存和最大内存设置。 export HADOOP_NAMENODE_OPTS="-server -Xmx2G -Xms2G -Xmn1G -XX:MaxPermSize=512m -XX:PermSize=512m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC" export HADOOP_DATANODE_OPTS="-server -Xmx1G -Xms1G -Xmn720M -XX:MaxPermSize=512m -XX:PermSize=512m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC" export HADOOP_SECONDARYNAMENODE_OPTS="-server -Xmx2G -Xms2G -Xmn1G -XX:MaxPermSize=512m -XX:PermSize=512m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC " export HADOOP_PID_DIR=${HADOOP_PREFIX}/pids
4.yarn-env.sh
YARN_HEAPSIZE=512 #默认值1000 #指定rs启动参数 #测试机器 export YARN_RESOURCEMANAGER_OPTS="-server -Xmx2G -Xms2G -Xmn1G -XX:MaxPermSize=512m -XX:PermSize=512m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC " export YARN_NODEMANAGER_OPTS="-server -Xmx1G -Xms1G -Xmn640M -XX:MaxPermSize=360m -XX:PermSize=360m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-UseGCOverheadLimit" export YARN_PID_DIR=${HADOOP_PREFIX}/pids
5.core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://bvdata</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/hadoop/tmpdir</value> </property> <property> <name>fs.trash.interval</name> <value>1440</value> <description>Number of minutes between trash checkpoints. If zero, the trash feature is disabled.</description> </property> <property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.SnappyCodec</value> </property> </configuration>
6.hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/hadoop/data/hadoop/dfs/name</value> <description>namenode data dir</description> <final>true</final> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/hadoop/data/hadoop/dfs/data</value> <description> datanode data dir </description> <final>true</final> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>file:/home/hadoop/hadoop/data/hadoop/namesecondary</value> <description> secondary namenode data dir </description> <final>true</final> </property> <!--指定hdfs的nameservice为bvdata,需要和core-site.xml中的保持一致 --> <property> <name>dfs.nameservices</name> <value>bvdata</value> <description>提供服务的NS逻辑名称,与core-site.xml里的对应</description> </property> <property> <name>dfs.ha.namenodes.bvdata</name> <value>c9test91,c9test92</value> </property> <property> <name>dfs.namenode.rpc-address.bvdata.c9test91</name> <value>c9test91:9000</value> </property> <property> <name>dfs.namenode.http-address.bvdata.c9test91</name> <value>c9test91:50070</value> </property> <property> <name>dfs.namenode.rpc-address.bvdata.c9test92</name> <value>c9test92:9000</value> </property> <property> <name>dfs.namenode.http-address.bvdata.c9test92</name> <value>c9test92:50070</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/home/hadoop/hadoop/data/hadoop/journal</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://c9test91:8485;c9test92:8485;c9test93:8485/bvdata</value> <description> 如何启动JournalNode:在JournalNode的各个节点上部署一份Hadoop代码,在hdfs-site.xml中添加以下配置,设置数据存放目录(注意,只能配置一个目录): 然后执行“bin/hdfs-daemon.sh start journalnode”,启动JournalNode服务。 </description> </property> <property> <name>ha.zookeeper.quorum</name> <value>c9test91:2181,c9test92:2181,c9test93:2181</value> <description>指定用于HA的ZooKeeper集群机器列表</description> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.journalnode.rpc-address</name> <value>0.0.0.0:8485</value> <description>journalnode的rpc地址</description> </property> <property> <name>dfs.client.failover.proxy.provider.bvdata</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> <description>指定hdfs client来识别bvdata命名空间并在namenode切换期间识别namnode的proxy类</description> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <property> <name>ha.health-monitor.rpc-timeout.ms</name> <value>90000</value> </property> <property> <name>ha.failover-controller.cli-check.rpc-timeout.ms</name> <value>60000</value> </property> <property> <name>ipc.client.connect.timeout</name> <value>60000</value> </property> <property> <name>dfs.client.read.shortcircuit.buffer.size</name> <value>4096</value> </property> <property> <name>dfs.image.transfer.bandwidthPerSec</name> <value>4194304</value> </property> <property> <name>dfs.hosts.exclude</name> <value>/home/hadoop/hadoop/etc/hadoop/excludes</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.support.append</name> <value>true</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.datanode.max.xcievers</name> <value>4096</value> </property> <property> <name>dfs.qjournal.write-txns.timeout.ms</name> <value>600000000</value> </property> </configuration>
7.mapred-site.xml
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property> <!-- mr目录在datanode上 /home/hadoop/hadoop/data/hadoop/mapred/system --> <property> <name>mapred.system.dir</name> <value>file:/home/hadoop/hadoop/data/hadoop/mapred/system</value> <final>true</final> </property> <!-- mr目录在datanode上 /home/hadoop/hadoop/data/hadoop/mapred/local --> <property> <name>mapred.local.dir</name> <value>file:/home/hadoop/hadoop/data/hadoop/mapred/local</value> <final>true</final> </property> <property> <name>mapreduce.map.memory.mb</name> <value>1024</value> <description>每个MapReduce作业的map任务可以申请的内存资源数量</description> </property> <property> <name>mapreduce.map.cpu.vcores</name> <value>1</value> <description>每个MapReduce作业的map任务可以申请的虚拟CPU资源的数量</description> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>1024</value> <description>每个MapReduce作业的reduce任务可以申请的内存资源数量</description> </property>
8.yarn-site.xml
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <description>shuffle service that needs to be set for Map Reduce to run</description> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> <description>新框架中 NodeManager 与 RM 通信的接口class</description> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>c9test93:8088</value> <description>新框架中各个 task 的资源调度及运行状况通过通过该 web 界面访问</description> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>c9test93</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>c9test93:8031</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>4096</value> <description> Hadoop YARN可以使用的最大内存量,用来控制每个Node上能运行MapReduce的数量。默认8GB </description> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>4</value> <description> Hadoop 2.x以上提供内存和CPU两种资源技术方式,该参数控制Node上的CPU数量。默认是8 </description> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property>
9.slaves
c9test93 c9test94 #这里需要说明:在91上启动hdfs,这里的slaves指的是datanode ,在93上启动yarn(resourcemanager),这里的slaves是指的nodemanager
10.cp文件到所有机器
scp -r /home/hadoop/hadoop/ c9test92:/home/hadoop/ scp -r /home/hadoop/hadoop/ c9test93:/home/hadoop/ scp -r /home/hadoop/hadoop/ c9test94:/home/hadoop/ scp ~/.bash_profile c9test92:~/ scp ~/.bash_profile c9test93:~/ scp ~/.bash_profile c9test94:~/
11.目录规划创建
#91:namenode journalnode /home/hadoop/hadoop/tmpdir /home/hadoop/hadoop/pids /home/hadoop/hadoop/data/hadoop/dfs/name /home/hadoop/hadoop/data/hadoop/namesecondary /home/hadoop/hadoop/data/hadoop/journal echo > /home/hadop/hadoop/etc/hadoop/excludes #92:namenode journalnode /home/hadoop/hadoop/tmpdir /home/hadoop/hadoop/pids /home/hadoop/hadoop/data/hadoop/dfs/name /home/hadoop/hadoop/data/hadoop/namesecondary /home/hadoop/hadoop/data/hadoop/journal echo > /home/hadop/hadoop/etc/hadoop/excludes #93:datanode resourcemanager jouralnode nodemanager /home/hadoop/hadoop/tmpdir /home/hadoop/hadoop/pids /home/hadoop/hadoop/data/hadoop/dfs/data /home/hadoop/hadoop/data/hadoop/mapred/system /home/hadoop/hadoop/data/hadoop/mapred/local /home/hadoop/hadoop/data/hadoop/journal echo > /home/hadop/hadoop/etc/hadoop/excludes #94:datanode nodemanager /home/hadoop/hadoop/tmpdir /home/hadoop/hadoop/pids /home/hadoop/hadoop/data/hadoop/dfs/data /home/hadoop/hadoop/data/hadoop/mapred/system /home/hadoop/hadoop/data/hadoop/mapred/local /home/hadoop/hadoop/data/hadoop/journal echo > /home/hadop/hadoop/etc/hadoop/excludes
12.初始化及启动服务
12.1 启动zookeeper集群
zkServer.sh start #或者用自己写的zkrun.sh 注意启动后查看status
12.2 格式化HDFS
#首先在919293的journalnode上单独启动journalnode,注意不要用hadoop-daemons.sh sbin/hadoop-daemon.sh start journalnode #jps可以看到多了JournalNode进程 #namenode91上执行: hdfs namenode -format #将91执行完毕后/home/hadoop/hadoop/data/hadoop/dfs/name scp到92上 (在92上备NN上同步主NN的元数据信息 hdfs namenode -bootstrapStandby 注意用此命令需要91的NN启动状态)
12.3 格式化ZK
hdfs zkfc -formatZK
12.4 hadoop启动
hadoop启动有几种方式:
1)方法一:zk—–dfs—-yarn
91:zkrun.sh start 91:start-dfs.sh 93:start-yarn.sh 关闭: 93:stop-yarn.sh 91:stop-dfs.sh 91:zkrun.sh stop
2)方法二(生产系统,为了详细看哪步出错):
启动: (1)zk (2)hadoop-daemon.sh start journalnode (91 92 93) (3)hadoop-daemon.sh start namenode(91 92) (4)hadoop-daemon.sh start datanode(93 94) (5) yarn-daemon.sh start resourcemanager(93) (6) yarn-daemon.sh start nodemanager(93 94) (7) hadoop-daemon.sh start zkfc(91 92) (8) habse启动hmaster(主 备) start-hbase.sh (9) hbase启动HRegionServer hbase-daemon.sh start master 关闭: (1) hbase关闭HRegionServer hbase-daemon.sh stop master (2) habse关闭hmaster(主 备) stop-hbase.sh (3) hadoop-daemon.sh stop zkfc(91 92) (4) yarn-daemon.sh stop nodemanager(93 94) (5) yarn-daemon.sh stop resourcemanager(93) (6) hadoop-daemon.sh stop datanode(93 94) (7) hadoop-daemon.sh stop namenode(91 92) (8)hadoop-daemon.sh stop journalnode (91 92 93) (9) zk
12.5 验证
#1)web验证 #namenode: http://c9test91:50070 http://c9test92:50070 #kill掉一个看另外一个的active状态 #ResourceManager: http://c9test93:8088 #2)上传文件 hdfs dfs -put jdk-6u45-linux-x64.bin /jdk.bin hdfs dfs -ls / #3)MR测试 创建一个文件word.txt如 ABC bdc ABC dfe def def等 上传到hdfs目录 hdfs dfs -put word.txt /word 执行MR hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /word /out #注意这里因为我们配置了mr的snappy压缩,导致这里直接使用wordcount会报错,下一篇文章会介绍snappy安装和部署,安装后执行MR会成功
13.snappy问题
因为core-site.xml中io.compression.codecs配置了org.apache.hadoop.io.compress.SnappyCodec
mapred-site.xml中的mapreduce.map.output.compress配置了true
mapred-site.xml中的mapreduce.map.output.compress.codec配置了org.apache.hadoop.io.compress.SnappyCodec
也就是说明在执行MR的时候是用snappy压缩的,因此需要单独在系统中安装snappy
相关文章推荐
- hadoop原生版安装部署---4.snappy
- hadoop原生版安装部署---5.hbase
- 生产环境上的HADOOP安装部署注意事项(HDP版)
- hadoop docker安装部署
- hadoop1.2.1单机试玩-安装部署
- hadoop入门第七步---hive部署安装(apache-hive-1.1.0)
- hadoop学习第八节:Hive介绍和安装部署(根据实验楼整理)
- HBase1.2.x安装部署(win8+jdk1.8+hadoop2.8.1 无需cygwin)
- Hadoop安装部署笔记
- Mesos上安装Hadoop超详细部署攻略
- Hadoop 集群 Hive 部署,安装mysql metastore
- hadoop入门第七步---hive部署安装(apache-hive-1.1.0)
- Ubuntu Kylin 安装和部署Hadoop(伪分布式)
- Hadoop第12周练习—HBase安装部署
- 实战2.Spark编译与部署(中)--Hadoop编译安装
- hadoop1.2.1 安装及伪分布式部署
- Linux 环境下部署Hadoop 2.x,建议尝试64位系统下进行本地编译的安装方式
- hadoop入门第七步---hive部署安装(apache-hive-1.1.0)
- hadoop学习记(1)--集群安装部署
- Hadoop+hive集群安装部署 (二)