hadoop快速搭建(参考)
2015-07-16 15:11
671 查看
在搭建的时候主要参考这篇博文,感觉写的还是比较详细;但是在多次格式化hdfs过后,集群可能出现数据节点不能正常启动的情况,这个主要是由于data和system目录下的current文件夹中的version文件的版本号不一致所导致的,可以手动将其改正;也可以将其直接删除,然后直接格式化一次,在直接生成即可。
在hadoop配置文件 的时候,可以多参考其他的文章。
标签:原创作品,允许转载,转载时请务必以超链接形式标明文章
原始出处 、作者信息和本声明。否则将追究法律责任。http://dngood.blog.51cto.com/446195/775368
对于Hadoop来说,最主要的是两个方面,一个是分布式文件系统HDFS,另一个是MapReduce计算模型,下面讲解下我在搭建Hadoop 环境过程。
Hadoop 测试环境
一 部署 Hadoop 前的准备工作
二 ssh 配置
ssh 详细了解
三 java环境配置
四 hadoop 配置
//注意使用hadoop 用户 操作
五 简单验证HDFS
在hadoop配置文件 的时候,可以多参考其他的文章。
标签:原创作品,允许转载,转载时请务必以超链接形式标明文章
原始出处 、作者信息和本声明。否则将追究法律责任。http://dngood.blog.51cto.com/446195/775368
对于Hadoop来说,最主要的是两个方面,一个是分布式文件系统HDFS,另一个是MapReduce计算模型,下面讲解下我在搭建Hadoop 环境过程。
Hadoop 测试环境
共4台测试机,1台namenode 3台datanode OS版本:RHEL 5.5 X86_64 Hadoop:0.20.203.0 Jdk:jdk1.7.0 角色 ip地址 namenode 192.168.57.75 datanode1 192.168.57.76 datanode2 192.168.57.78 datanode3 192.168.57.79
一 部署 Hadoop 前的准备工作
1 需要知道hadoop依赖Java和SSH Java 1.5.x (以上),必须安装。 ssh 必须安装并且保证 sshd 一直运行,以便用Hadoop 脚本管理远端Hadoop守护进程。 2 建立 Hadoop 公共帐号 所有的节点应该具有相同的用户名,可以使用如下命令添加: useradd hadoop passwd hadoop 3 配置 host 主机名 tail -n 3 /etc/hosts 192.168.57.75 namenode 192.168.57.76 datanode1 192.168.57.78 datanode2 192.168.57.79 datanode3 4 以上几点要求所有节点(namenode|datanode)配置全部相同
二 ssh 配置
ssh 详细了解
1 生成私匙 id_rsa 与 公匙 id_rsa.pub 配置文件 [hadoop@hadoop1 ~]$ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/hadoop/.ssh/id_rsa. Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub. The key fingerprint is: d6:63:76:43:e2:5b:8e:85:ab:67:a2:7c:a6:8f:23:f9 hadoop@hadoop1.test.com 2 私匙 id_rsa 与 公匙 id_rsa.pub 配置文件 [hadoop@hadoop1 ~]$ ls .ssh/ authorized_keys id_rsa id_rsa.pub known_hosts 3 把公匙文件上传到datanode服务器 [hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@datanode1 28 hadoop@datanode1's password: Now try logging into the machine, with "ssh 'hadoop@datanode1'", and check in: .ssh/authorized_keys to make sure we haven't added extra keys that you weren't expecting. [hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@datanode2 28 hadoop@datanode2's password: Now try logging into the machine, with "ssh 'hadoop@datanode2'", and check in: .ssh/authorized_keys to make sure we haven't added extra keys that you weren't expecting. [hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@datanode3 28 hadoop@datanode3's password: Now try logging into the machine, with "ssh 'hadoop@datanode3'", and check in: .ssh/authorized_keys to make sure we haven't added extra keys that you weren't expecting. [hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@localhost 28 hadoop@localhost's password: Now try logging into the machine, with "ssh 'hadoop@localhost'", and check in: .ssh/authorized_keys to make sure we haven't added extra keys that you weren't expecting. 4 验证 [hadoop@hadoop1 ~]$ ssh datanode1 Last login: Thu Feb 2 09:01:16 2012 from 192.168.57.71 [hadoop@hadoop2 ~]$ exit logout [hadoop@hadoop1 ~]$ ssh datanode2 Last login: Thu Feb 2 09:01:18 2012 from 192.168.57.71 [hadoop@hadoop3 ~]$ exit logout [hadoop@hadoop1 ~]$ ssh datanode3 Last login: Thu Feb 2 09:01:20 2012 from 192.168.57.71 [hadoop@hadoop4 ~]$ exit logout [hadoop@hadoop1 ~]$ ssh localhost Last login: Thu Feb 2 09:01:24 2012 from 192.168.57.71 [hadoop@hadoop1 ~]$ exit logout
三 java环境配置
1 下载合适的jdk //此文件为64Linux 系统使用的 RPM包 wget http://download.oracle.com/otn-pub/java/jdk/7/jdk-7-linux-x64.rpm 2 安装jdk rpm -ivh jdk-7-linux-x64.rpm 3 验证java [root@hadoop1 ~]# java -version java version "1.7.0" Java(TM) SE Runtime Environment (build 1.7.0-b147) Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode) [root@hadoop1 ~]# ls /usr/java/ default jdk1.7.0 latest 4 配置java环境变量 #vim /etc/profile //在profile文件中加入如下信息: #add for hadoop export J***A_HOME=/usr/java/jdk1.7.0 export CLASSPATH=.:$J***A_HOME/jre/lib/rt.jar:$J***A_HOME/lib/dt.jar:$J***A_HOME/ export PATH=$PATH:$J***A_HOME/bin //使环境变量生效 source /etc/profile 5 拷贝 /etc/profile 到 datanode [root@hadoop1 src]# scp /etc/profile root@datanode1:/etc/ The authenticity of host 'datanode1 (192.168.57.86)' can't be established. RSA key fingerprint is b5:00:d1:df:73:4c:94:f1:ea:1f:b5:cd:ed:3a:cc:e1. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'datanode1,192.168.57.86' (RSA) to the list of known hosts. root@datanode1's password: profile 100% 1624 1.6KB/s 00:00 [root@hadoop1 src]# scp /etc/profile root@datanode2:/etc/ The authenticity of host 'datanode2 (192.168.57.87)' can't be established. RSA key fingerprint is 57:cf:96:15:78:a3:94:93:30:16:8e:66:47:cd:f9:cd. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'datanode2,192.168.57.87' (RSA) to the list of known hosts. root@datanode2's password: profile 100% 1624 1.6KB/s 00:00 [root@hadoop1 src]# scp /etc/profile root@datanode3:/etc/ The authenticity of host 'datanode3 (192.168.57.88)' can't be established. RSA key fingerprint is 31:73:e8:3c:20:0c:1e:b2:59:5c:d1:01:4b:26:41:70. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'datanode3,192.168.57.88' (RSA) to the list of known hosts. root@datanode3's password: profile 100% 1624 1.6KB/s 00:00 6 拷贝 jdk 安装包,并在每个datanode 节点安装 jdk 包 [root@hadoop1 ~]# scp -r /home/hadoop/src/ hadoop@datanode1:/home/hadoop/ hadoop@datanode1's password: hadoop-0.20.203.0rc1.tar.gz 100% 58MB 57.8MB/s 00:01 jdk-7-linux-x64.rpm 100% 78MB 77.9MB/s 00:01 [root@hadoop1 ~]# scp -r /home/hadoop/src/ hadoop@datanode2:/home/hadoop/ hadoop@datanode2's password: hadoop-0.20.203.0rc1.tar.gz 100% 58MB 57.8MB/s 00:01 jdk-7-linux-x64.rpm 100% 78MB 77.9MB/s 00:01 [root@hadoop1 ~]# scp -r /home/hadoop/src/ hadoop@datanode3:/home/hadoop/ hadoop@datanode3's password: hadoop-0.20.203.0rc1.tar.gz 100% 58MB 57.8MB/s 00:01 jdk-7-linux-x64.rpm 100% 78MB 77.9MB/s 00:01
四 hadoop 配置
//注意使用hadoop 用户 操作
1 配置目录 [hadoop@hadoop1 ~]$ pwd /home/hadoop [hadoop@hadoop1 ~]$ ll total 59220 lrwxrwxrwx 1 hadoop hadoop 17 Feb 1 16:59 hadoop -> hadoop-0.20.203.0 drwxr-xr-x 12 hadoop hadoop 4096 Feb 1 17:31 hadoop-0.20.203.0 -rw-r--r-- 1 hadoop hadoop 60569605 Feb 1 14:24 hadoop-0.20.203.0rc1.tar.gz 2 配置hadoop-env.sh,指定java位置 vim hadoop/conf/hadoop-env.sh export J***A_HOME=/usr/java/jdk1.7.0 3 配置core-site.xml //定位文件系统的 namenode [hadoop@hadoop1 ~]$ cat hadoop/conf/core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://namenode:9000</value> </property> </configuration> 4 配置mapred-site.xml //定位jobtracker 所在的主节点 [hadoop@hadoop1 ~]$ cat hadoop/conf/mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>namenode:9001</value> </property> </configuration> 5 配置hdfs-site.xml //配置HDFS副本数量 [hadoop@hadoop1 ~]$ cat hadoop/conf/hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration> 6 配置 master 与 slave 配置文档 [hadoop@hadoop1 ~]$ cat hadoop/conf/masters namenode [hadoop@hadoop1 ~]$ cat hadoop/conf/slaves datanode1 datanode2 7 拷贝hadoop 目录到所有节点(datanode) [hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode1:/home/hadoop/ [hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode2:/home/hadoop/ [hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode3:/home/hadoop 8 格式化 HDFS [hadoop@hadoop1 hadoop]$ bin/hadoop namenode -format 12/02/02 11:31:15 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = hadoop1.test.com/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.20.203.0 STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May 4 07:57:50 PDT 2011 ************************************************************/ Re-format filesystem in /tmp/hadoop-hadoop/dfs/name ? (Y or N) Y //这里输入Y 12/02/02 11:31:17 INFO util.GSet: VM type = 64-bit 12/02/02 11:31:17 INFO util.GSet: 2% max memory = 19.33375 MB 12/02/02 11:31:17 INFO util.GSet: capacity = 2^21 = 2097152 entries 12/02/02 11:31:17 INFO util.GSet: recommended=2097152, actual=2097152 12/02/02 11:31:17 INFO namenode.FSNamesystem: fsOwner=hadoop 12/02/02 11:31:18 INFO namenode.FSNamesystem: supergroupsupergroup=supergroup 12/02/02 11:31:18 INFO namenode.FSNamesystem: isPermissionEnabled=true 12/02/02 11:31:18 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 12/02/02 11:31:18 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 12/02/02 11:31:18 INFO namenode.NameNode: Caching file names occuring more than 10 times 12/02/02 11:31:18 INFO common.Storage: Image file of size 112 saved in 0 seconds. 12/02/02 11:31:18 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted. 12/02/02 11:31:18 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hadoop1.test.com/127.0.0.1 ************************************************************/ [hadoop@hadoop1 hadoop]$ 9 启动hadoop 守护进程 [hadoop@hadoop1 hadoop]$ bin/start-all.sh starting namenode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-namenode-hadoop1.test.com.out datanode1: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop2.test.com.out datanode2: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop3.test.com.out datanode3: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop4.test.com.out starting jobtracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-jobtracker-hadoop1.test.com.out datanode1: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop2.test.com.out datanode2: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop3.test.com.out datanode3: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop4.test.com.out 10 验证 //namenode [hadoop@hadoop1 logs]$ jps 2883 JobTracker 3002 Jps 2769 NameNode //datanode [hadoop@hadoop2 ~]$ jps 2743 TaskTracker 2670 DataNode 2857 Jps [hadoop@hadoop3 ~]$ jps 2742 TaskTracker 2856 Jps 2669 DataNode [hadoop@hadoop4 ~]$ jps 2742 TaskTracker 2852 Jps 2659 DataNode Hadoop 监控web页面 http://192.168.57.75:50070/dfshealth.jsp
五 简单验证HDFS
hadoop 的文件命令格式如下: hadoop fs -cmd <args> //建立目录 [hadoop@hadoop1 hadoop]$ bin/hadoop fs -mkdir /test-hadoop //査看目录 [hadoop@hadoop1 hadoop]$ bin/hadoop fs -ls / Found 2 items drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:32 /test-hadoop drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp //査看目录包括子目录 [hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr / drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:32 /test-hadoop drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred drwx------ - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system -rw------- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info //添加文件 [hadoop@hadoop1 hadoop]$ bin/hadoop fs -put /home/hadoop/hadoop-0.20.203.0rc1.tar.gz /test-hadoop [hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr / drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:34 /test-hadoop -rw-r--r-- 2 hadoop supergroup 60569605 2012-02-02 13:34 /test-hadoop/hadoop-0.20.203.0rc1.tar.gz drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred drwx------ - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system -rw------- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info //获取文件 [hadoop@hadoop1 hadoop]$ bin/hadoop fs -get /test-hadoop/hadoop-0.20.203.0rc1.tar.gz /tmp/ [hadoop@hadoop1 hadoop]$ ls /tmp/*.tar.gz /tmp/1.tar.gz /tmp/hadoop-0.20.203.0rc1.tar.gz //删除文件 [hadoop@hadoop1 hadoop]$ bin/hadoop fs -rm /test-hadoop/hadoop-0.20.203.0rc1.tar.gz Deleted hdfs://namenode:9000/test-hadoop/hadoop-0.20.203.0rc1.tar.gz [hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr / drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:57 /test-hadoop drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred drwx------ - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system -rw------- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:36 /user -rw-r--r-- 2 hadoop supergroup 321 2012-02-02 13:36 /user/hadoop //删除目录 [hadoop@hadoop1 hadoop]$ bin/hadoop fs -rmr /test-hadoop Deleted hdfs://namenode:9000/test-hadoop [hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr / drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred drwx------ - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system -rw------- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:36 /user -rw-r--r-- 2 hadoop supergroup 321 2012-02-02 13:36 /user/hadoop //hadoop fs 帮助(部分) [hadoop@hadoop1 hadoop]$ bin/hadoop fs -help hadoop fs is the command to execute fs commands. The full syntax is: hadoop fs [-fs <local | file system URI>] [-conf <configuration file>] [-D <propertyproperty=value>] [-ls <path>] [-lsr <path>] [-du <path>] [-dus <path>] [-mv <src> <dst>] [-cp <src> <dst>] [-rm [-skipTrash] <src>] [-rmr [-skipTrash] <src>] [-put <localsrc> ... <dst>] [-copyFromLocal <localsrc> ... <dst>] [-moveFromLocal <localsrc> ... <dst>] [-get [-ignoreCrc] [-crc] <src> <localdst> [-getmerge <src> <localdst> [addnl]] [-cat <src>] [-copyToLocal [-ignoreCrc] [-crc] <src> <localdst>] [-moveToLocal <src> <localdst>] [-mkdir <path>] [-report] [-setrep [-R] [-w] <rep> <path/file>] [-touchz <path>] [-test -[ezd] <path>] [-stat [format] <path>] [-tail [-f] <path>] [-text <path>] [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...] [-chown [-R] [OWNER][:[GROUP]] PATH...] [-chgrp [-R] GROUP PATH...] [-count[-q] <path>] [-help [cmd]]
相关文章推荐
- Hadoop -- HDFS 原理、架构与特性介绍
- 详解linux运维工程师入门级必备技能
- apache、iis6、iis7设置301教程(适用虚拟主机)
- CentOS6.4 安装zimbra邮件系统
- ShellExecute
- RT-Thread内核之线程调度(六)
- linux内核互斥问题总结#1
- linux系统安装rsync和sersync实现数据实时同步详细步骤(rsync实时同步)
- linux shell比较两个文件夹下的文本内容
- linux c 获取硬盘的序列号
- 如何获得select标签下option的值
- 冒泡事件
- ruby调用shell问题——找不到自己环境变量中的程序
- ARM Memory Copy
- 用shell脚本编写区别两个文件夹内文件的不同
- linux命令说明
- CentOS-6.5系统基础优化篇,附带优化脚本 推荐
- 快速理解Docker - 容器级虚拟化解决方案
- MMORPG服务器架构
- linux系统各种终端命令