您的位置:首页 > 运维架构

hadoop 学习笔记1 —— ubuntu上构建伪分布式的hadoop安装

2014-03-01 09:37 525 查看
(一)配置hadoop运行环境

i)因为hadoop是用java编写的,所以在装hadoop之前,首先需要安装java环境

1 安装jdk, 在http://www.oracle.com/technetwork/java/javase/downloads/index.html上下载最新的jdk安装

2 设置java运行环境,即在/etc/environment 或者 /etc/profile 或者 home目录即~/.bashrc 文件中设置JAVA_HOME、CLASSPATH、PATH环境变量

需要注意这三个文件的区别。

3 下载hadoop,在apache官网很容易下载hadoop的压缩包,然后可以新建一个hadoop用户和hadoop用户组,将hadoop解压到hadoop用户的home目录下面

4 修改hadoop运行环境变量,其实只要在hadoop的安装文件所在的目录下的conf目录中的hadoop-env.sh中加入export JAVA_HOME=你真实的JDK安装目录

ii)因为hadoop运行需要无命令的ssh,所以下面安装和配置ssh

sudo apt-get install ssh
sudo apt-get install rsync
ssh-keygen -t  rsa -P  ' ' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_key
ssh localhost


iii)配置hadoop伪分布模式

进入hadoop的安装目录,以hadoop身份

1、修改conf/core-site.xml为:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop/tmp</value>
</property>
</configuration>


2、修改conf/hdfs-site.xml为:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>


3、修改conf/mapred-site.xml为:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>


4、格式化hadoop的hdfs

hadoop namenode -format


(二)启动hadoop

hadoop@clebeg:~/hadoop$ bin/start-all.sh
Warning: $HADOOP_HOME is deprecated.

starting namenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-namenode-clebeg.out
localhost: Warning: $HADOOP_HOME is deprecated.
localhost:
localhost: starting datanode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-clebeg.out
localhost: Warning: $HADOOP_HOME is deprecated.
localhost:
localhost: starting secondarynamenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-secondarynamenode-clebeg.out
starting jobtracker, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-jobtracker-clebeg.out
localhost: Warning: $HADOOP_HOME is deprecated.
localhost:
localhost: starting tasktracker, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-tasktracker-clebeg.out


从上面的输出可以看到日志文件在哪里?

查看hadoop各个进程是否都已经启动完毕:

hadoop@clebeg:~/hadoop$ jps
5250 JobTracker
5407 TaskTracker
4816 NameNode
4988 DataNode
5594 Jps
5156 SecondaryNameNode


如果出现上面的所有进程,表示你已经安装好了hadoop

如果有问题,请查看日志,我的问题是datanode没有启动,原因是有一个文件的权限设置有问题,改过来,重新格式化hdfs即可
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: