您的位置:首页 > 运维架构

Hadoop单节点

2016-07-15 15:45 260 查看

Hadoop: Setting up a Single Node Cluster

[first time] install ssh, rsync

注意: 修改
$HADOOP_HOME/etc/hadoop/hadoop-env.sh
中的JAVA_HOME。这一步很重要,然后启动时会报错。

Unpack the downloaded Hadoop distribution. In the distribution, edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows:

# set to the root of your Java installation
export JAVA_HOME=/usr/java/latest


Standalone operation

By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.

Execution

The following instructions are to run a MapReduce job locally. If you want to execute a job on YARN, see YARN on Single Node.

Format the filesystem:

$ bin/hdfs namenode -format

Start NameNode daemon and DataNode daemon:

$ sbin/start-dfs.sh

The hadoop daemon log output is written to the HADOOPLOGDIRdirectory(defaultstoHADOOP_HOME/logs).

Browse the web interface for the NameNode; by default it is available at:

NameNode - http://localhost:50070/

Make the HDFS directories required to execute MapReduce jobs:

bin/hdfsdfs−mkdir/user bin/hdfs dfs -mkdir /user/

Copy the input files into the distributed filesystem:

$ bin/hdfs dfs -put etc/hadoop input

Run some of the examples provided:

注意:跑这个例
e340
子时要先确认当前目录底下没有output目录,否则执行这个任务会报错。因为hadoop不会覆盖这个目录。

$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output ‘dfs[a-z.]+’

Examine the output files: Copy the output files from the distributed filesystem to the local filesystem and examine them:

bin/hdfsdfs−getoutputoutput cat output/*

or

View the output files on the distributed filesystem:

$ bin/hdfs dfs -cat output/*

When you’re done, stop the daemons with:

$ sbin/stop-dfs.sh

参考http://www.powerxing.com/install-hadoop/

常见问题:

执行hadoop namenode -format之前必须删除缓存文件,不然会报错,导致找不到datanode

默认的缓存在/tmp 目录下。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  hadoop java