用cygwin实现hadoop 全分布配置
2012-03-30 23:31
204 查看
最近在研究hadoop
搞了半个月终于实现在window下利用cygwin的全分布配置
环境
1、hadoop 0.20.2
2、vista 机器一台 192.168.0.102 机器名:ken-PC(master)
winxp 虚拟机 192.168.0.222 机器名:winxp (slaves)
步骤
1、分别在vista和xp上安装cygwin,分别在两个系统中建立两个名字相同的用户,ken,配置好SSH,确保master机器可以无密码在cygwin上访问slaves机器, 这个网上很多文章有介绍,记得关闭window防火墙,否则无办法互联
2、在vista上安装hadoop,具体配置如下,注意红色字
hadoop-env.sh
# Set Hadoop-specific environment variables here.
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use. Required.
export JAVA_HOME=/cygdrive/C/soft/Java/jdk1.6.0_12
# Extra Java CLASSPATH elements. Optional.
# export HADOOP_CLASSPATH=
# The maximum amount of heap to use, in MB. Default is 1000.
# export HADOOP_HEAPSIZE=2000
# Extra Java runtime options. Empty by default.
# export HADOOP_OPTS=-server
# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"
# export HADOOP_TASKTRACKER_OPTS=
# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
# export HADOOP_CLIENT_OPTS
# Extra ssh options. Empty by default.
# export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"
# Where log files are stored. $HADOOP_HOME/logs by default.
# export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
# File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default.
# export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves
# host:path where hadoop code should be rsync'd from. Unset by default.
# export HADOOP_MASTER=master:/home/$USER/src/hadoop
# Seconds to sleep between slave commands. Unset by default. This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HADOOP_SLAVE_SLEEP=0.1
# The directory where pid files are stored. /tmp by default.
# export HADOOP_PID_DIR=/var/hadoop/pids
# A string representing this instance of hadoop. $USER by default.
# export HADOOP_IDENT_STRING=$USER
# The scheduling priority for daemon processes. See 'man nice'.
# export HADOOP_NICENESS=10
core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property><name>fs.default.name</name><value>hdfs://192.168.0.102:9991</value></property>
<property><name>hadoop.tmp.dir</name><value>/root/hadoopfile/202/coretmp/</value></property>
</configuration>
hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property><name>dfs.replication</name><value>1</value></property>
<property><name>dfs.name.dir</name><value>hadoopfile/name/</value></property>
<property><name>dfs.data.dir</name><value>hadoopfile/data/</value></property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property><name>mapred.job.tracker</name><value>192.168.0.102:9992</value></property>
<property><name>mapred.child.tmp</name><value>C:\root\tmp</value></property>
</configuration>
注意:mapred.child.tmp一定要是绝对路径
masters
192.168.0.102
slaves
192.168.0.222
3、在虚拟的winxp上安装hadoop,配置文件内容要与vista上的配置文件内容相同,hadoop 存放的路径也要与vista相同
4、分别在vista和winxp上的host文件添加如下内容
192.168.0.102 ken-PC
192.168.0.222 winxp
大小写敏感
5、在vista上格式化hadoop
![](http://my.csdn.net/uploads/201203/30/1333121804_5652.png)
6、在vista上启动hadoop
![](http://my.csdn.net/uploads/201203/30/1333121834_6275.png)
7、在vista上通过http://localhost:50070查看运行情况
![](http://my.csdn.net/uploads/201203/30/1333121636_6039.jpg)
8、在vista上运行wordcount示例
首先在hadoop当前目录建立一个input文件
然后在input文件中建立两个text文件,文件内容随意
![](http://my.csdn.net/uploads/201203/30/1333121873_4772.png)
在cygwin里面输入
![](http://my.csdn.net/uploads/201203/30/1333121906_5811.png)
运行wordcount
![](http://my.csdn.net/uploads/201203/30/1333121664_6517.jpg)
搞了半个月终于实现在window下利用cygwin的全分布配置
环境
1、hadoop 0.20.2
2、vista 机器一台 192.168.0.102 机器名:ken-PC(master)
winxp 虚拟机 192.168.0.222 机器名:winxp (slaves)
步骤
1、分别在vista和xp上安装cygwin,分别在两个系统中建立两个名字相同的用户,ken,配置好SSH,确保master机器可以无密码在cygwin上访问slaves机器, 这个网上很多文章有介绍,记得关闭window防火墙,否则无办法互联
2、在vista上安装hadoop,具体配置如下,注意红色字
hadoop-env.sh
# Set Hadoop-specific environment variables here.
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use. Required.
export JAVA_HOME=/cygdrive/C/soft/Java/jdk1.6.0_12
# Extra Java CLASSPATH elements. Optional.
# export HADOOP_CLASSPATH=
# The maximum amount of heap to use, in MB. Default is 1000.
# export HADOOP_HEAPSIZE=2000
# Extra Java runtime options. Empty by default.
# export HADOOP_OPTS=-server
# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"
# export HADOOP_TASKTRACKER_OPTS=
# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
# export HADOOP_CLIENT_OPTS
# Extra ssh options. Empty by default.
# export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"
# Where log files are stored. $HADOOP_HOME/logs by default.
# export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
# File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default.
# export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves
# host:path where hadoop code should be rsync'd from. Unset by default.
# export HADOOP_MASTER=master:/home/$USER/src/hadoop
# Seconds to sleep between slave commands. Unset by default. This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HADOOP_SLAVE_SLEEP=0.1
# The directory where pid files are stored. /tmp by default.
# export HADOOP_PID_DIR=/var/hadoop/pids
# A string representing this instance of hadoop. $USER by default.
# export HADOOP_IDENT_STRING=$USER
# The scheduling priority for daemon processes. See 'man nice'.
# export HADOOP_NICENESS=10
core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property><name>fs.default.name</name><value>hdfs://192.168.0.102:9991</value></property>
<property><name>hadoop.tmp.dir</name><value>/root/hadoopfile/202/coretmp/</value></property>
</configuration>
hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property><name>dfs.replication</name><value>1</value></property>
<property><name>dfs.name.dir</name><value>hadoopfile/name/</value></property>
<property><name>dfs.data.dir</name><value>hadoopfile/data/</value></property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property><name>mapred.job.tracker</name><value>192.168.0.102:9992</value></property>
<property><name>mapred.child.tmp</name><value>C:\root\tmp</value></property>
</configuration>
注意:mapred.child.tmp一定要是绝对路径
masters
192.168.0.102
slaves
192.168.0.222
3、在虚拟的winxp上安装hadoop,配置文件内容要与vista上的配置文件内容相同,hadoop 存放的路径也要与vista相同
4、分别在vista和winxp上的host文件添加如下内容
192.168.0.102 ken-PC
192.168.0.222 winxp
大小写敏感
5、在vista上格式化hadoop
![](http://my.csdn.net/uploads/201203/30/1333121804_5652.png)
6、在vista上启动hadoop
![](http://my.csdn.net/uploads/201203/30/1333121834_6275.png)
7、在vista上通过http://localhost:50070查看运行情况
![](http://my.csdn.net/uploads/201203/30/1333121636_6039.jpg)
8、在vista上运行wordcount示例
首先在hadoop当前目录建立一个input文件
然后在input文件中建立两个text文件,文件内容随意
![](http://my.csdn.net/uploads/201203/30/1333121873_4772.png)
在cygwin里面输入
![](http://my.csdn.net/uploads/201203/30/1333121906_5811.png)
运行wordcount
![](http://my.csdn.net/uploads/201203/30/1333121664_6517.jpg)
相关文章推荐
- hadoop介绍及伪分布模式配置
- Hadoop开发环境配置(伪分布模式)
- (转)Ubuntu14.0.4中hadoop2.4.0伪分布模式配置
- Hadoop机架感知的实现及配置
- 在伪分布hadoop-1.2.1环境下配置hive-0.12.0
- hadoop1.2.1伪分布模式配置
- 解决root用户ssh配置无密码登陆/hadoop用户照仿可以实现相同功能:hadoop用户登录并且把命令的所有root换成home/hadoop
- Hadoop伪分布配置与基于Eclipse开发环境搭建
- hadoop cygwin eclipse 从入门到配置hadoop的心路历程 伪分布式
- centos系统实现hadoop安装配置《二》
- hadoop的伪分布环境配置(2.5.2)
- hadoop2.7.1伪分布模式配置文件
- Windows下Cygwin环境的Hadoop安装(1)- Cygwin安装和配置
- 无需Cygwin!Windows7 Hadoop0.20.2安装配置(单节点)
- HBase入门笔记(三)-- 完全分布模式Hadoop集群安装配置
- Hadoop配置伪分布模式
- CentOS安装配置Hadoop 1.2.1(伪分布模式)
- hadoop-2.2.0全分布集群安装与配置(接上篇伪分布式)
- windows和cygwin下hadoop安装配置
- 云计算Hadoop配置(三) ——完全分布配置