在Fedora18上配置个人的Hadoop开发环境
2014-10-28 17:54
423 查看
在Fedora18上配置个人的Hadoop开发环境
1. 背景
文章中讲述了类似于“personalcondor”的一种“personal hadoop” 配置法。主要的目的是配置文件和日志文件有一个单一的源,可以用软连接到开发生成的二进制库,这样就可以在所生成二进制库更新的时候维护其他的数据和配置项。
2. 用户案例
1. 比较不用改变现有系统中安装软件的情况下,在本地的沙盒环境中做测试2. 单一源的配置文件盒日志文件
3. 参考
网页:http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment http://vichargrave.com/create-a-hadoop-build-and-development-environment-for-hadoop/ http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ http://wiki.apache.org/hadoop/
http://docs.hortonworks.com/CURRENT/index.htm#Appendix/Configuring_Ports/HDFS_Ports.htm
书籍:
Hadoop “TheDefinitive Guide”
4. 免责声明
1. 当前是在使用存在maven依赖的非本地开发步骤,详细信息在本地的包中,请查看:https://fedoraproject.org/wiki/Features/Hadoop2 . 单节点环境搭建步骤在下边列出
5. 先决条件
1. 配置没有密码的sshyum install openssh openssh-clients openssh-server # generate a public/private key, if you don't already have one ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys chmod 600 ~/.ssh/* # testing ssh: ps -ef | grep sshd # verify sshd is running ssh localhost # accept the certification when prompted sudo passwd root # Make sure the root has a password |
yum install cmake git subversion dh-make ant autoconf automake sharutils libtool asciidoc xmlto curl protobuf-compiler gcc-c++
3. 安装java和开发环境
yum install java-1.7.0-openjdk java-1.7.0-openjdk-devel java-1.7.0-openjdk-javadoc *maven*
修改.bashrc文件信息
export JVM_ARGS="-Xmx1024m -XX:MaxPermSize=512m"
export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=512m"
注意:以上的配置用在F18的OpenJDK7上,可以通过以下命令来测试当前环境配置是否成功。
mvn install -Dmaven.test.failure.ignore=true
6. 搭建“personal-hadoop“
1. 下载编译hadoopgit clone git://git.apache.org/hadoop-common.git
cd hadoop-common
git checkout -b branch-2.0.4-alpha origin/branch-2.0.4-alpha
mvn clean package -Pdist -DskipTests
2. 创建沙盒环境
在这个配置中我们默认到/home/tstclair
cd ~
mkdir personal-hadoop
cd personal-hadoop
mkdir -p conf data name logs/yarn
ln -sf <your-git-loc>/hadoop-dist/target/hadoop-2.0.4-alpha home
3. 重写你的环境变量
附加以下信息到家目录的.bashrc文件中
# Hadoop env override: export HADOOP_BASE_DIR=${HOME}/personal-hadoop export HADOOP_LOG_DIR=${HOME}/personal-hadoop/logs export HADOOP_PID_DIR=${HADOOP_BASE_DIR} export HADOOP_CONF_DIR=${HOME}/personal-hadoop/conf export HADOOP_COMMON_HOME=${HOME}/personal-hadoop/home export HADOOP_HDFS_HOME=${HADOOP_COMMON_HOME} export HADOOP_MAPRED_HOME=${HADOOP_COMMON_HOME} # Yarn env override: export HADOOP_YARN_HOME=${HADOOP_COMMON_HOME} export YARN_LOG_DIR=${HADOOP_LOG_DIR}/yarn #classpath override to search hadoop loc export CLASSPATH=/usr/share/java/:${HADOOP_COMMON_HOME}/share #Finally update your PATH export PATH=${HADOOP_COMMON_HOME}/bin:${HADOOP_COMMON_HOME}/sbin:${HADOOP_COMMON_HOME}/libexec:${PATH} |
source ~/.bashrc
which hadoop # verify it should be ${HOME}/personal-hadoop/home/bin
hadoop -help # verify classpath is correct.
5. 创建初始化单一源的配置文件
拷贝默认的配置文件
cp ${HADOOP_COMMON_HOME}/etc/hadoop/* ${HADOOP 4000 _BASE_DIR}/conf
更新你的hdfs-site.xml文件:
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Override tstclair with your home directory --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost/</value> </property> <property> <name>dfs.name.dir</name> <value>file:///home/tstclair/personal-hadoop/name</value> </property> <property> <name>dfs.http.address</name> <value>0.0.0.0:50070</value> </property> <property> <name>dfs.data.dir</name> <value>file:///home/tstclair/personal-hadoop/data</value> </property> <property> <name>dfs.datanode.address</name> <value>0.0.0.0:50010</value> </property> <property> <name>dfs.datanode.http.address</name> <value>0.0.0.0:50075</value> </property> <property> <name>dfs.datanode.ipc.address</name> <value>0.0.0.0:50020</value> </property> </configuratio |
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Update or append these vars --> <configuration> <property> <name>mapreduce.cluster.temp.dir</name> <value> </value> <description>No description</description> <final>true</final> </property> <property> <name>mapreduce.cluster.local.dir</name> <value> </value> <description>No description</description> <final>true</final> </property> </configuration> |
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>localhost:8031</value> <description>host is the hostname of the resource manager and port is the port on which the NodeManagers contact the Resource Manager. </description> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>localhost:8030</value> <description>host is the hostname of the resourcemanager and port is the port on which the Applications in the cluster talk to the Resource Manager. </description> </property> <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value> <description>In case you do not want to use the default scheduler</description> </property> <property> <name>yarn.resourcemanager.address</name> <value>localhost:8032</value> <description>the host is the hostname of the ResourceManager and the port is the port on which the clients can talk to the Resource Manager. </description> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value> </value> <description>the local directories used by the nodemanager</description> </property> <property> <name>yarn.nodemanager.address</name> <value>localhost:8034</value> <description>the nodemanagers bind to this port</description> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>10240</value> <description>the amount of memory on the NodeManager in GB</description> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce.shuffle</value> <description>shuffle service that needs to be set for Map Reduce to run </description> </property> </configuration> |
7. 开启单节点的Hadoop集群
格式化namenodehadoop namenode -format
#verify output is correct.
开启hdfs:
start-dfs.sh
打开浏览器http://localhost:50070,查看是否有一个节点已经被启动
接下来开启yarn
start-yarn.sh
通过查看日志文件来验证是否正常启动
最后通过运行MapReduce任务来检查Hadoop是否正常运行
cd ${HADOOP_COMMON_HOME}/share/hadoop/mapreduce
hadoop jar hadoop-mapreduce-example-2.0.4-alpha.jar randomwriter out
文章出处:http://timothysc.github.io/blog/2013/04/22/personalhadoop/
相关文章推荐
- 在Fedora18上配置个人的Hadoop开发环境
- 配置Android开发环境(fedora)
- Windows下基于Eclipse的Hadoop开发环境完全配置(三)
- Fedora12配置OpenCV开发环境
- (转载)基于Eclipse的Hadoop应用开发环境的配置
- 基于Eclipse的Hadoop应用开发环境配置
- jsp/servlet:j2ee开发环境配置个人总结
- J2EE开发环境配置个人总结
- 基于Eclipse的Hadoop应用开发环境配置
- Windows下基于Eclipse的Hadoop开发环境完全配置(一)
- Windows下基于Eclipse的Hadoop开发环境完全配置(二)
- Windows下基于Eclipse的Hadoop开发环境完全配置(一)
- Fedora环境下Gtk+开发环境配置
- Fedora 10 X64 下 j2ee + Eclipse + Tomcat 开发环境配置
- 云计算Hadoop配置(四)——Eclipse中搭建Map-reduce开发环境
- Windows下基于Eclipse的Hadoop开发环境完全配置(二)
- Android开发环境配置的个人总结,也许对某些RP像我一样低的童鞋有帮助
- Windows下基于Eclipse的Hadoop开发环境完全配置(三) .
- eclipse hadoop开发环境配置
- Linux(Fedora 14)下 java开发环境配置 ――jdk的安装与配置