单机/伪分布式Hadoop2.4.1安装文档 2014-07-08 21:16 2275人阅读 评论(0) 收藏
2014-07-08 21:16
471 查看
转载自官方文档,最新版请见:http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/SingleCluster.html
补充:建议添加如下环境变量
#hadoop configuration
export PATH=$PATH:/home/jediael/hadoop-2.4.1/bin:/home/jediael/hadoop-2.4.1/sbin
export HADOOP_HOME=/home/jediael/hadoop-2.4.1
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
Purpose
Prerequisites
Supported Platforms
Required Software
Installing Software
Download
Prepare to Start the Hadoop Cluster
Standalone Operation
Pseudo-Distributed Operation
Configuration
Setup passphraseless ssh
Execution
YARN on Single Node
Fully-Distributed Operation
Windows is also a supported platform but the followings steps are for Linux only. To set up Hadoop on Windows, see wiki page.
Java™ must be installed. Recommended Java versions are described at HadoopJavaVersions.
ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.
For example on Ubuntu Linux:
第二步不做好像没影响。
Try the following command:
This will display the usage documentation for the hadoop script.
Now you are ready to start your Hadoop cluster in one of the three supported modes:
Local (Standalone) Mode
Pseudo-Distributed Mode
Fully-Distributed Mode
The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.
etc/hadoop/core-site.xml:
etc/hadoop/hdfs-site.xml:
If you cannot ssh to localhost without a passphrase, execute the following commands:
Format the filesystem:
Start NameNode daemon and DataNode daemon:此步骤报很多警告,但不影响执行结果。
The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).
Browse the web interface for the NameNode; by default it is available at:
NameNode - http://localhost:50070/
Make the HDFS directories required to execute MapReduce jobs:
Copy the input files into the distributed filesystem:
Run some of the examples provided:
Examine the output files:
Copy the output files from the distributed filesystem to the local filesystem and examine them:
or
View the output files on the distributed filesystem:
When you're done, stop the daemons with:
The following instructions assume that 1. ~ 4. steps of the above instructions are already executed.
Configure parameters as follows:
etc/hadoop/mapred-site.xml:
etc/hadoop/yarn-site.xml:
Start ResourceManager daemon and NodeManager daemon:
Browse the web interface for the ResourceManager; by default it is available at:
ResourceManager - http://localhost:8088/
Run a MapReduce job.
When you're done, stop the daemons with:
补充:建议添加如下环境变量
#hadoop configuration
export PATH=$PATH:/home/jediael/hadoop-2.4.1/bin:/home/jediael/hadoop-2.4.1/sbin
export HADOOP_HOME=/home/jediael/hadoop-2.4.1
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.
Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.Purpose
Prerequisites
Supported Platforms
Required Software
Installing Software
Download
Prepare to Start the Hadoop Cluster
Standalone Operation
Pseudo-Distributed Operation
Configuration
Setup passphraseless ssh
Execution
YARN on Single Node
Fully-Distributed Operation
Purpose
This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).Prerequisites
Supported Platforms
GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.Windows is also a supported platform but the followings steps are for Linux only. To set up Hadoop on Windows, see wiki page.
Required Software
Required software for Linux include:Java™ must be installed. Recommended Java versions are described at HadoopJavaVersions.
ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.
Installing Software
If your cluster doesn't have the requisite software you will need to install it.For example on Ubuntu Linux:
$ sudo apt-get install ssh $ sudo apt-get install rsync
Download
To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.Prepare to Start the Hadoop Cluster
Unpack the downloaded Hadoop distribution. In the distribution, edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows:# set to the root of your Java installation export JAVA_HOME=/usr/java/latest # Assuming your installation directory is /usr/local/hadoop export HADOOP_PREFIX=/usr/local/hadoop
第二步不做好像没影响。
Try the following command:
$ bin/hadoop
This will display the usage documentation for the hadoop script.
Now you are ready to start your Hadoop cluster in one of the three supported modes:
Local (Standalone) Mode
Pseudo-Distributed Mode
Fully-Distributed Mode
Standalone Operation
By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.
$ mkdir input $ cp etc/hadoop/*.xml input $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+' $ cat output/*
Pseudo-Distributed Operation
Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.Configuration
Use the following:etc/hadoop/core-site.xml:
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
etc/hadoop/hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
Setup passphraseless ssh
Now check that you can ssh to the localhost without a passphrase:$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Execution
The following instructions are to run a MapReduce job locally. If you want to execute a job on YARN, see YARN on Single Node.Format the filesystem:
$ bin/hdfs namenode -format
Start NameNode daemon and DataNode daemon:此步骤报很多警告,但不影响执行结果。
$ sbin/start-dfs.sh
The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).
Browse the web interface for the NameNode; by default it is available at:
NameNode - http://localhost:50070/
Make the HDFS directories required to execute MapReduce jobs:
$ bin/hdfs dfs -mkdir /user $ bin/hdfs dfs -mkdir /user/<username>
Copy the input files into the distributed filesystem:
$ bin/hdfs dfs -put etc/hadoop input
Run some of the examples provided:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
Examine the output files:
Copy the output files from the distributed filesystem to the local filesystem and examine them:
$ bin/hdfs dfs -get output output $ cat output/*
or
View the output files on the distributed filesystem:
$ bin/hdfs dfs -cat output/*
When you're done, stop the daemons with:
$ sbin/stop-dfs.sh
YARN on Single Node
You can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running ResourceManager daemon and NodeManager daemon in addition.The following instructions assume that 1. ~ 4. steps of the above instructions are already executed.
Configure parameters as follows:
etc/hadoop/mapred-site.xml:
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
etc/hadoop/yarn-site.xml:
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
Start ResourceManager daemon and NodeManager daemon:
$ sbin/start-yarn.sh
Browse the web interface for the ResourceManager; by default it is available at:
ResourceManager - http://localhost:8088/
Run a MapReduce job.
When you're done, stop the daemons with:
$ sbin/stop-yarn.sh
相关文章推荐
- 单机/伪分布式Hadoop2.4.1安装文档
- 单机/伪分布式Hadoop2.4.1安装文档
- 单机/伪分布式Hadoop2.4.1安装文档
- 单机/伪分布式Hadoop2.4.1安装文档
- 安装hadoop2.6.0伪分布式环境 分类: A1_HADOOP 2015-04-27 18:59 409人阅读 评论(0) 收藏
- 单机/伪分布式Hadoop2.4.1安装文档
- 安装hadoop1.2.1集群环境 分类: A1_HADOOP 2014-08-29 15:49 1444人阅读 评论(0) 收藏
- Zookeeper实现分布式锁 分类: hadoop Java 2015-06-25 22:38 68人阅读 评论(0) 收藏
- Hadoop安装教程_单机/伪分布式配置_Hadoop2.4.1_Ubuntu14.04
- Ubuntu14.04下Hadoop2.4.1单机/伪分布式安装配置教程
- Hadoop1.2.1伪分布模式安装指南 分类: A1_HADOOP 2014-08-17 10:52 1346人阅读 评论(0) 收藏
- Hadoop 2.4.1 搭建Ha遇到问题记录 分类: hadoop 2015-07-08 22:35 133人阅读 评论(0) 收藏
- 【Nutch2.3基础教程】集成Nutch/Hadoop/Hbase/Solr构建搜索引擎:安装及运行【集群环境】 分类: 1_Nutch 0_jediael开发 2015-01-24 17:24 3522人阅读 评论(1) 收藏
- Hadoop安装教程_单机/伪分布式配置_Ubuntu 14.04/Hadoop 2.4.1
- 【Nutch2.2.1基础教程之2.1】集成Nutch/Hbase/Solr构建搜索引擎之一:安装及运行【单机环境】 分类: H3_NUTCH H4_SOLR/LUCENCE 2014-07-06 14:46 3543人阅读 评论(2) 收藏
- ubuntu10.04 安装配置tftp服务 分类: arm-linux-Ubuntu 2013-07-22 16:29 263人阅读 评论(0) 收藏
- VC6安装错误——Error Launching acmboot.exe 分类: VC++ 2013-07-22 16:28 341人阅读 评论(0) 收藏
- Oracle从软件安装到运行的全流程 分类: H2_ORACLE 2013-05-18 13:09 1072人阅读 评论(0) 收藏
- 编译、裁剪、安装、删除 Ubuntu内核和模块管理 分类: arm-linux-Ubuntu 2013-07-22 16:29 319人阅读 评论(0) 收藏
- 在U盘上安装Linux系统解决方案 分类: ubuntu 测试 2013-07-17 10:39 349人阅读 评论(0) 收藏