您的位置:首页 > 运维架构

Hadoop学习(一): Ubuntu上安装Hadoop

2016-06-21 13:45 253 查看
Hadoop学习(一): Ubuntu上安装Hadoop

1.安装ssh

$ sudo apt-get install openssh-client
$ sudo apt-get install openssh-server


2.查看JAVA_HOME变量值

/opt/jdk1.8.0_91


3.安装hadoop-2.7.2

从官网(http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.2/)下载,解压到hadoop-2.7.2

4.修改hadoop-2.7.2的etc/hadoop/hadoop-env.sh文件,设置JAVA_HOME

export JAVA_HOME=/opt/jdk1.8.0_91


输入以下命令,弹出hadoop的用法,则配置成功

$ bin/hadoop


hadoop支持以下三种模式:

5.Standalone Operation(单机模式)

开启ssh服务

$ sudo /etc/init.d/ssh start


免密码登陆

#client端产生密钥:
$ ssh-keygen -t rsa
#server端:
$ cp id_rsa.pub authorized_keys
$ chmod 600 authorized_keys


测试:The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.

$ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'
$ cat output/*


6.Pseudo-Distributed Operation(单机伪分布模式)

修改两处文件:

etc/hadoop/core-site.xml:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>


etc/hadoop/hdfs-site.xml:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>


ssh登陆:

$ ssh localhost


执行:

/
4000
/Format the filesystem:
$ bin/hdfs namenode -format

//Start NameNode daemon and DataNode daemon:
$ sbin/start-dfs.sh


Browse the web interface for the NameNode; by default it is available at: NameNode - http://localhost:50070/

//Make the HDFS directories required to execute MapReduce jobs:
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/<username>

//Copy the input files into the distributed filesystem:
$ bin/hdfs dfs -put etc/hadoop input

//Run some of the examples provided:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'

//Examine the output files: Copy the output files from the distributed filesystem to the local filesystem and examine them:
$ bin/hdfs dfs -get output output
$ cat output/*
//or view the output files on the distributed filesystem:
$ bin/hdfs dfs -cat output/*

//When you’re done, stop the daemons with:
$ sbin/stop-dfs.sh
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: