您的位置:首页 > 大数据

《深入理解大数据-大数据处理与编辑实践》hadoop1.2.1安装

2016-11-09 11:08 525 查看
【第一部分】《深入理解大数据》一书的源代码 

http://download.csdn.net/detail/heming621/9423291

http://hadoop.apache.org/

https://www.zhihu.com/question/19795366

http://mooc.guokr.com/course/2194/%E5%A4%A7%E6%95%B0%E6%8D%AE%E7%B3%BB%E7%BB%9F%E5%9F%BA%E7%A1%80/

http://download.csdn.net/album/detail/3466/1/1

【第二部分】安装hadoop1.2.1安装

【1】安装java程序

jdk-6u45-linux-i586-rpm.rar 解压后为 jdk-6u45-linux-i586-rpm.bin

安装执行 ./jdk-6u45-linux-i586-rpm.bin

安装成功后目录为 /usr/java/jdk1.6.0_45

A22811459:/usr/java/jdk1.6.0_45 # pwd

/usr/java/jdk1.6.0_45

A22811459:/usr/java/jdk1.6.0_45 # ls

COPYRIGHT  LICENSE  README.html  THIRDPARTYLICENSEREADME.txt  bin  include  jre  lib  man  src.zip

【1.2】在系统中/etc/profile添加java路径,便于调用

#set java

export JAVA_HOME=/usr/java/jdk1.6.0_45

export JRE_HOME=$JAVA_HOME/jre

export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib

export PATH=$PATH:$JAVA_HOME/bin

【1.3】让配置生效

# source /etc/profile

【1.4】查看java版本,说明安装成功

A22811459:/usr/java/jdk1.6.0_45 # java -version

java version "1.6.0_45"

Java(TM) SE Runtime Environment (build 1.6.0_45-b06)

Java HotSpot(TM) Server VM (build 20.45-b01, mixed mode

【1.5】可以写一个简单的java程序进行编译运行,进一步确保java安装成功

HelloWel.java

public class HelloWel {

       public static void main(String[] args)

       {

          System.out.println("JAVA OK");    

       }    

}

编译和运行

# javac HelloWel.java

# java HelloWel

JAVA OK

至此可百分百确保Java安装没有问题,java路径(后面会用到)为 /usr/java/jdk1.6.0_45

【2】hadoop1.2.1安装 参考《深入理解大数据》

【2.1】创建hadoop用户

#groupadd hadoop-user

#useradd -g hadoop-user hadoop

#passwd hadoop

【2.2】配置SSH

#ssh-keygen -t rsa

# cd /root/.ssh/

#cp id_rsa.pub authorized_keys

#ssh localhost

查看结果

# ls

authorized_keys  id_rsa  id_rsa.pub  known_hosts

【2.3】配置hadoop环境

hadoop系统版本 hadoop-1.2.1.tar.gz

解压后linux目录为 /home/longhui/hadoop/hadoop-1.2.1/

【2.3.1】配置 conf/hadoop-env.sh 配置JAVA_HOME对应的路径

export JAVA_HOME=/usr/java/jdk1.6.0_45

【2.3.2】配置三个xml文件

【1】core-site.xml配置

<configuration>

<property>

<name>hadoop.tmp.dir</name>

<value>/tmp/hadoop</value>

</property>

<property>

<name>fs.default.name</name>

<value>hdfs://A22811459:9000</value>

</property>

</configuration>

【备注】

临时文件夹为/tmp/hadoop,配置成功后该目录下会生成两个文件夹dfs  mapred,并且/tmp目录下会生成一些pid文件

A22811459:/tmp # ls hadoop

hadoop/                            hadoop-root-jobtracker.pid         hadoop-root-secondarynamenode.pid

hadoop-root-datanode.pid           hadoop-root-namenode.pid           hadoop-root-tasktracker.pid

【2】hdfs-site.xml

<configuration>

<property>

<name>dfs.name.dir</name>

<value>/home/longhui/hadoop/dfs/name</value>

</property>

<property>

<name>dfs.data.dir</name>

<value>/home/longhui/hadoop/dfs/data</value>

</property>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

【备注】

配置成功后/home/longhui/hadoop/dfs/name下会生成一些文件current  image  in_use.lock  previous.checkpoint

/home/longhui/hadoop/dfs/data生成blocksBeingWritten  current  detach  in_use.lock  storage  tmp

【3】mapred-site.xml

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>A22811459:9001</value>

</property>

<property>

<name>mapreduce.cluster.local.dir</name>

<value>/home/longhui/hadoop/mapred/local</value>

</property>

<property>

<name>mapreduce.jobtracker.system.dir</name>

<value>/home/longhui/hadoop/mapred/system</value>

</property>

</configuration>

【4】由于主机名为A22811459,所以就不是localhost,并且/etc/hosts文件中也要修改下

127.0.0.1       A22811459

【2.3.3】在/etc/profile中添加hadoop路径并# source /etc/profile 生效

#set hadoop

export HADOOP_HOME_WARN_SUPPRESS=1

export HADOOP_HOME=/home/longhui/hadoop/hadoop-1.2.1

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

【2.3.4】格式化HDFS文件系统

执行 bin/hadoop namenode -format 或直接hadoop namenode -format 接着输入Y

# hadoop namenode -format

16/12/15 12:59:50 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = A22811459/127.0.0.1

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 1.2.1

STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013

STARTUP_MSG:   java = 1.6.0_45

************************************************************/

Re-format filesystem in /home/longhui/hadoop/dfs/name ? (Y or N) Y

16/12/15 12:59:52 INFO util.GSet: Computing capacity for map BlocksMap

16/12/15 12:59:52 INFO util.GSet: VM type       = 32-bit

16/12/15 12:59:52 INFO util.GSet: 2.0% max memory = 932118528

16/12/15 12:59:52 INFO util.GSet: capacity      = 2^22 = 4194304 entries

16/12/15 12:59:52 INFO util.GSet: recommended=4194304, actual=4194304

16/12/15 12:59:53 INFO namenode.FSNamesystem: fsOwner=root

16/12/15 12:59:53 INFO namenode.FSNamesystem: supergroup=supergroup

16/12/15 12:59:53 INFO namenode.FSNamesystem: isPermissionEnabled=true

16/12/15 12:59:53 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100

16/12/15 12:59:53 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)

16/12/15 12:59:53 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0

16/12/15 12:59:53 INFO namenode.NameNode: Caching file names occuring more than 10 times

16/12/15 12:59:53 INFO common.Storage: Image file /home/longhui/hadoop/dfs/name/current/fsimage of size 110 bytes saved in 0 seconds.

16/12/15 12:59:53 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/home/longhui/hadoop/dfs/name/current/edits

16/12/15 12:59:53 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/home/longhui/hadoop/dfs/name/current/edits

16/12/15 12:59:53 INFO common.Storage: Storage directory /home/longhui/hadoop/dfs/name has been successfully formatted.

16/12/15 12:59:53 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at A22811459/127.0.0.1

************************************************************/

【备注】如果警告Warning: $HADOOP_HOME is deprecated. 

解决方法:在/etc/profie中添加一行,然后让配置生效# source /etc/profile,再运行bin/hadoop namenode -format就不会报错

export HADOOP_HOME_WARN_SUPPRESS=1

【2.3.5】启动hadoop环境  注停止时stop-all.sh

# start-all.sh

starting namenode, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-A22811459.out

localhost: starting datanode, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-A22811459.out

localhost: starting secondarynamenode, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-A22811459.out

starting jobtracker, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-jobtracker-A22811459.out

localhost: starting tasktracker, logging to /home/longhui/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-A22811459.out

【2.3.6】使用jps查看集群状态,除jps进程外,另外五个进程缺一不可。如下说明正常启动了

# jps

2352 TaskTracker

1940 DataNode

1802 NameNode

2465 Jps

2211 JobTracker

2106 SecondaryNameNode

【3】运行第一个自带的测试用例:计算PI的值

A22811459:/home/longhui/hadoop/hadoop-1.2.1 # hadoop jar hadoop-examples-1.2.1.jar pi 2 5

Number of Maps  = 2

Samples per Map = 5

Wrote input for Map #0

Wrote input for Map #1

Starting Job

16/12/15 14:06:04 INFO mapred.FileInputFormat: Total input paths to process : 2

16/12/15 14:06:04 INFO mapred.JobClient: Running job: job_201612151254_0001

16/12/15 14:06:05 INFO mapred.JobClient:  map 0% reduce 0%

16/12/15 14:06:10 INFO mapred.JobClient:  map 100% reduce 0%

16/12/15 14:06:18 INFO mapred.JobClient:  map 100% reduce 33%

16/12/15 14:06:19 INFO mapred.JobClient:  map 100% reduce 100%

16/12/15 14:06:19 INFO mapred.JobClient: Job complete: job_201612151254_0001

16/12/15 14:06:19 INFO mapred.JobClient: Counters: 30

16/12/15 14:06:19 INFO mapred.JobClient:   Job Counters

16/12/15 14:06:19 INFO mapred.JobClient:     Launched reduce tasks=1

16/12/15 14:06:19 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6864

16/12/15 14:06:19 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

16/12/15 14:06:19 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

16/12/15 14:06:19 INFO mapred.JobClient:     Launched map tasks=2

16/12/15 14:06:19 INFO mapred.JobClient:     Data-local map tasks=2

16/12/15 14:06:19 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=8661

16/12/15 14:06:19 INFO mapred.JobClient:   File Input Format Counters

16/12/15 14:06:19 INFO mapred.JobClient:     Bytes Read=236

16/12/15 14:06:19 INFO mapred.JobClient:   File Output Format Counters

16/12/15 14:06:19 INFO mapred.JobClient:     Bytes Written=97

16/12/15 14:06:19 INFO mapred.JobClient:   FileSystemCounters

16/12/15 14:06:19 INFO mapred.JobClient:     FILE_BYTES_READ=50

16/12/15 14:06:19 INFO mapred.JobClient:     HDFS_BYTES_READ=478

16/12/15 14:06:19 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=160889

16/12/15 14:06:19 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=215

16/12/15 14:06:19 INFO mapred.JobClient:   Map-Reduce Framework

16/12/15 14:06:19 INFO mapred.JobClient:     Map output materialized bytes=56

16/12/15 14:06:19 INFO mapred.JobClient:     Map input records=2

16/12/15 14:06:19 INFO mapred.JobClient:     Reduce shuffle bytes=56

16/12/15 14:06:19 INFO mapred.JobClient:     Spilled Records=8

16/12/15 14:06:19 INFO mapred.JobClient:     Map output bytes=36

16/12/15 14:06:19 INFO mapred.JobClient:     Total committed heap usage (bytes)=377028608

16/12/15 14:06:19 INFO mapred.JobClient:     CPU time spent (ms)=3100

16/12/15 14:06:19 INFO mapred.JobClient:     Map input bytes=48

16/12/15 14:06:19 INFO mapred.JobClient:     SPLIT_RAW_BYTES=242

16/12/15 14:06:19 INFO mapred.JobClient:     Combine input records=0

16/12/15 14:06:19 INFO mapred.JobClient:     Reduce input records=4

16/12/15 14:06:19 INFO mapred.JobClient:     Reduce input groups=4

16/12/15 14:06:19 INFO mapred.JobClient:     Combine output records=0

16/12/15 14:06:19 INFO mapred.JobClient:     Physical memory (bytes) snapshot=376963072

16/12/15 14:06:19 INFO mapred.JobClient:     Reduce output records=0

16/12/15 14:06:19 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1132392448

16/12/15 14:06:19 INFO mapred.JobClient:     Map output records=4

Job Finished in 15.585 seconds

Estimated value of Pi is 3.60000000000000000000

【4】

【4.1】输入服务器IP:50070端口,这里可以看到HDFS的管理情况。,可查看如下html界面
http://10.17.35.xxx:50070/dfshealth.jsp

NameNode 'A22811459:9000'

Started:Thu Dec 15 13:00:10 GMT+08:00 2016
Version:1.2.1, r1503152
Compiled:Mon Jul 22 15:23:09 PDT 2013 by mattf
Upgrades:There are no upgrades in progress.
Browse the filesystem
Namenode Logs

Cluster Summary

11 files and directories, 13 blocks = 24 total. Heap Size is 57.69 MB / 888.94 MB (6%)


Configured Capacity:273 GB
DFS Used:40 KB
Non DFS Used:260.77 GB
DFS Remaining:12.23 GB
DFS Used%:0 %
DFS Remaining%:4.48 %
Live Nodes:1
Dead Nodes:0
Decommissioning Nodes:0
Number of Under-Replicated Blocks:0

NameNode Storage:

Storage DirectoryTypeState
/home/longhui/hadoop/dfs/nameIMAGE_AND_EDITSActive
This is Apache Hadoop release 1.2.1

【4.2】50030端口可以看到Map/Reduce的管理情况

A22811459 Hadoop Map/Reduce Administration

Quick Links
Scheduling Info
Running Jobs
Retired Jobs
Local Logs

State: RUNNING
Started: Thu Dec 15 12:54:23 GMT+08:00 2016
Version: 1.2.1, r1503152
Compiled: Mon Jul 22 15:23:09 PDT 2013 by mattf
Identifier: 201612151254
SafeMode: OFF

Cluster Summary (Heap Size is 51.56 MB/888.94 MB)

Running Map TasksRunning Reduce TasksTotal SubmissionsNodesOccupied Map SlotsOccupied Reduce SlotsReserved Map SlotsReserved Reduce SlotsMap Task CapacityReduce Task CapacityAvg. Tasks/NodeBlacklisted NodesGraylisted NodesExcluded Nodes
00110000224.00000

Scheduling Information

Queue NameStateScheduling Information
defaultrunningN/A
Filter (Jobid, Priority, User, Name)
Example: 'user:smith 3200' will filter by 'smith' only in the user field and '3200' in all fields

Running Jobs

Completed Jobs

JobidStartedPriorityUserNameMap % CompleteMap TotalMaps CompletedReduce % CompleteReduce TotalReduces CompletedJob Scheduling InformationDiagnostic Info
job_201612151254_0001Thu Dec 15 14:06:04 GMT+08:00 2016NORMALrootPiEstimator100.00%
22100.00%
11NANA

Retired Jobs

none

Local Logs

Log directory,
Job Tracker History

This is Apache Hadoop release 1.2.1
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: