Hadoop - 单机配置 - 使用Hadoop2.8.0和Ubuntu16.04
2017-04-27 22:58
465 查看
系统版本
anliven@Ubuntu1604:~$ uname -a Linux Ubuntu1604 4.8.0-36-generic #36~16.04.1-Ubuntu SMP Sun Feb 5 09:39:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux anliven@Ubuntu1604:~$ anliven@Ubuntu1604:~$ cat /proc/version Linux version 4.8.0-36-generic (buildd@lgw01-18) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) ) #36~16.04.1-Ubuntu SMP Sun Feb 5 09:39:57 UTC 2017 anliven@Ubuntu1604:~$ anliven@Ubuntu1604:~$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.2 LTS Release: 16.04 Codename: xenial anliven@Ubuntu1604:~$
创建hadoop用户
anliven@Ubuntu1604:~$ sudo useradd -m hadoop -s /bin/bash anliven@Ubuntu1604:~$ sudo passwd hadoop 输入新的 UNIX 密码: 重新输入新的 UNIX 密码: passwd:已成功更新密码 anliven@Ubuntu1604:~$ anliven@Ubuntu1604:~$ sudo adduser hadoop sudo 正在添加用户"hadoop"到"sudo"组... 正在将用户“hadoop”加入到“sudo”组中 完成。 anliven@Ubuntu1604:~$
更新apt及安装vim
hadoop@Ubuntu1604:~$ sudo apt-get update 命中:1 http://mirrors.aliyun.com/ubuntu xenial InRelease 命中:2 http://mirrors.aliyun.com/ubuntu xenial-updates InRelease 命中:3 http://mirrors.aliyun.com/ubuntu xenial-backports InRelease 命中:4 http://mirrors.aliyun.com/ubuntu xenial-security InRelease 正在读取软件包列表... 完成 hadoop@Ubuntu1604:~$ hadoop@Ubuntu1604:~$ sudo apt-get install vim 正在读取软件包列表... 完成 正在分析软件包的依赖关系树 正在读取状态信息... 完成 vim 已经是最新版 (2:7.4.1689-3ubuntu1.2)。 升级了 0 个软件包,新安装了 0 个软件包,要卸载 0 个软件包,有 50 个软件包未被升级。 hadoop@Ubuntu1604:~$
配置SSH免密码登录
hadoop@Ubuntu1604:~$ sudo apt-get install openssh-server 正在读取软件包列表... 完成 正在分析软件包的依赖关系树 正在读取状态信息... 完成 openssh-server 已经是最新版 (1:7.2p2-4ubuntu2.1)。 升级了 0 个软件包,新安装了 0 个软件包,要卸载 0 个软件包,有 50 个软件包未被升级。 hadoop@Ubuntu1604:~$ hadoop@Ubuntu1604:~$ cd ~ hadoop@Ubuntu1604:~$ mkdir .ssh hadoop@Ubuntu1604:~$ cd .ssh hadoop@Ubuntu1604:~/.ssh$ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/hadoop/.ssh/id_rsa. Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub. The key fingerprint is: SHA256:DzjVWgTQB5I1JGRBmWi6gVHJ03V4WnJZEdojtbou0DM hadoop@Ubuntu1604 The key's randomart image is: +---[RSA 2048]----+ | o.o =X@B=*o | |. + +.*+*B.. | | o + *+.* | |. o .o = . | | o .o S | | . . E. + | | . o. . | | .. | | .. | +----[SHA256]-----+ hadoop@Ubuntu1604:~/.ssh$ hadoop@Ubuntu1604:~/.ssh$ cat id_rsa.pub >> authorized_keys hadoop@Ubuntu1604:~/.ssh$ ls -l 总用量 12 -rw-rw-r-- 1 hadoop hadoop 399 4月 27 07:33 authorized_keys -rw------- 1 hadoop hadoop 1679 4月 27 07:32 id_rsa -rw-r--r-- 1 hadoop hadoop 399 4月 27 07:32 id_rsa.pub hadoop@Ubuntu1604:~/.ssh$ hadoop@Ubuntu1604:~/.ssh$ cd hadoop@Ubuntu1604:~$ hadoop@Ubuntu1604:~$ ssh localhost The authenticity of host 'localhost (127.0.0.1)' can't be established. ECDSA key fingerprint is SHA256:fZ7fAvnnFk0/Imkn0YPdc2Gzxnfr0IJGSRb1swbm7oU. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts. Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.8.0-36-generic x86_64) * Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support: https://ubuntu.com/advantage 44 个可升级软件包。 0 个安全更新。 *** 需要重启系统 *** Last login: Thu Apr 27 07:25:26 2017 from 192.168.16.1 hadoop@Ubuntu1604:~$ hadoop@Ubuntu1604:~$ exit 注销 Connection to localhost closed. hadoop@Ubuntu1604:~$
安装Java
hadoop@Ubuntu1604:~$ dpkg -l |grep jdk hadoop@Ubuntu1604:~$ hadoop@Ubuntu1604:~$ sudo apt-get install openjdk-8-jre openjdk-8-jdk 正在读取软件包列表... 完成 正在分析软件包的依赖关系树 正在读取状态信息... 完成 将会同时安装下列软件: ...... ...... ...... done. 正在处理用于 libc-bin (2.23-0ubuntu7) 的触发器 ... 正在处理用于 ca-certificates (20160104ubuntu1) 的触发器 ... Updating certificates in /etc/ssl/certs... 0 added, 0 removed; done. Running hooks in /etc/ca-certificates/update.d... done. done. hadoop@Ubuntu1604:~$ hadoop@Ubuntu1604:~$ dpkg -l |grep jdk ii openjdk-8-jdk:amd64 8u121-b13-0ubuntu1.16.04.2 amd64 OpenJDK Development Kit (JDK) ii openjdk-8-jdk-headless:amd64 8u121-b13-0ubuntu1.16.04.2 amd64 OpenJDK Development Kit (JDK) (headless) ii openjdk-8-jre:amd64 8u121-b13-0ubuntu1.16.04.2 amd64 OpenJDK Java runtime, using Hotspot JIT ii openjdk-8-jre-headless:amd64 8u121-b13-0ubuntu1.16.04.2 amd64 OpenJDK Java runtime, using Hotspot JIT (headless) hadoop@Ubuntu1604:~$ hadoop@Ubuntu1604:~$ dpkg -L openjdk-8-jdk | grep '/bin$' /usr/lib/jvm/java-8-openjdk-amd64/bin hadoop@Ubuntu1604:~$ hadoop@Ubuntu1604:~$ vim ~/.bashrc hadoop@Ubuntu1604:~$ hadoop@Ubuntu1604:~$ head ~/.bashrc |grep java export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64" hadoop@Ubuntu1604:~$ hadoop@Ubuntu1604:~$ source ~/.bashrc hadoop@Ubuntu1604:~$ hadoop@Ubuntu1604:~$ echo $JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64 hadoop@Ubuntu1604:~$ hadoop@Ubuntu1604:~$ java -version openjdk version "1.8.0_121" OpenJDK Runtime Environment (build 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13) OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode) hadoop@Ubuntu1604:~$
安装Hadoop
hadoop@Ubuntu1604:~$ sudo tar -zxf ~/hadoop-2.8.0.tar.gz -C /usr/local [sudo] hadoop 的密码: hadoop@Ubuntu1604:~$ cd /usr/local hadoop@Ubuntu1604:/usr/local$ sudo mv ./hadoop-2.8.0/ ./hadoop hadoop@Ubuntu1604:/usr/local$ sudo chown -R hadoop ./hadoop hadoop@Ubuntu1604:/usr/local$ ls -l |grep hadoop drwxr-xr-x 9 hadoop dialout 4096 3月 17 13:31 hadoop hadoop@Ubuntu1604:/usr/local$ cd ./hadoop hadoop@Ubuntu1604:/usr/local/hadoop$ ls -l 总用量 148 drwxr-xr-x 2 hadoop dialout 4096 3月 17 13:31 bin drwxr-xr-x 3 hadoop dialout 4096 3月 17 13:31 etc drwxr-xr-x 2 hadoop dialout 4096 3月 17 13:31 include drwxr-xr-x 3 hadoop dialout 4096 3月 17 13:31 lib drwxr-xr-x 2 hadoop dialout 4096 3月 17 13:31 libexec -rw-r--r-- 1 hadoop dialout 99253 3月 17 13:31 LICENSE.txt -rw-r--r-- 1 hadoop dialout 15915 3月 17 13:31 NOTICE.txt -rw-r--r-- 1 hadoop dialout 1366 3月 17 13:31 README.txt drwxr-xr-x 2 hadoop dialout 4096 3月 17 13:31 sbin drwxr-xr-x 4 hadoop dialout 4096 3月 17 13:31 share hadoop@Ubuntu1604:/usr/local/hadoop$ ./bin/hadoop version Hadoop 2.8.0 Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 91f2b7a13d1e97be65db92ddabc627cc29ac0009 Compiled by jdu on 2017-03-17T04:12Z Compiled with protoc 2.5.0 From source with checksum 60125541c2b3e266cbf3becc5bda666 This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.8.0.jar hadoop@Ubuntu1604:/usr/local/hadoop$
运行Hadoop单机配置下的grep示例
Hadoop 默认模式为非分布式模式(本地模式),无需进行其他配置即可运行。非分布式即单 Java 进程,方便进行调试。hadoop@Ubuntu1604:~$ cd /usr/local/hadoop/ hadoop@Ubuntu1604:/usr/local/hadoop$ mkdir ./input hadoop@Ubuntu1604:/usr/local/hadoop$ cp ./etc/hadoop/*.xml ./input/ hadoop@Ubuntu1604:/usr/local/hadoop$ ls -l input/ 总用量 56 drwxrwxr-x 2 hadoop hadoop 4096 4月 27 22:23 ./ drwxr-xr-x 10 hadoop dialout 4096 4月 27 22:23 ../ -rw-r--r-- 1 hadoop hadoop 4942 4月 27 22:23 capacity-scheduler.xml -rw-r--r-- 1 hadoop hadoop 774 4月 27 22:23 core-site.xml -rw-r--r-- 1 hadoop hadoop 9683 4月 27 22:23 hadoop-policy.xml -rw-r--r-- 1 hadoop hadoop 775 4月 27 22:23 hdfs-site.xml -rw-r--r-- 1 hadoop hadoop 620 4月 27 22:23 httpfs-site.xml -rw-r--r-- 1 hadoop hadoop 3518 4月 27 22:23 kms-acls.xml -rw-r--r-- 1 hadoop hadoop 5546 4月 27 22:23 kms-site.xml -rw-r--r-- 1 hadoop hadoop 690 4月 27 22:23 yarn-site.xml hadoop@Ubuntu1604:/usr/local/hadoop$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.0.jar grep ./input ./output 'dfs[a-z.]+' 17/04/27 22:29:45 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 17/04/27 22:29:45 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 17/04/27 22:29:45 INFO input.FileInputFormat: Total input files to process : 8 17/04/27 22:29:45 INFO mapreduce.JobSubmitter: number of splits:8 ...... ...... ...... 17/04/27 22:29:49 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=1273712 FILE: Number of bytes written=2504878 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 Map-Reduce Framework Map input records=1 Map output records=1 Map output bytes=17 Map output materialized bytes=25 Input split bytes=121 Combine input records=0 Combine output records=0 Reduce input groups=1 Reduce shuffle bytes=25 Reduce input records=1 Reduce output records=1 Spilled Records=2 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=0 Total committed heap usage (bytes)=1054867456 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=123 File Output Format Counters Bytes Written=23 hadoop@Ubuntu1604:/usr/local/hadoop$ hadoop@Ubuntu1604:/usr/local/hadoop$ ls -l ./output/ 总用量 4 -rw-r--r-- 1 hadoop hadoop 11 4月 27 22:29 part-r-00000 -rw-r--r-- 1 hadoop hadoop 0 4月 27 22:29 _SUCCESS hadoop@Ubuntu1604:/usr/local/hadoop$ hadoop@Ubuntu1604:/usr/local/hadoop$ cat ./output/* 1 dfsadmin hadoop@Ubuntu1604:/usr/local/hadoop$
Hadoop 默认不会覆盖结果文件,再次运行前需要先将output目录删除。
hadoop@Ubuntu1604:/usr/local/hadoop$ rm -rf ./output
Hadoop附带示例
hadoop@Ubuntu1604:/usr/local/hadoop$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.0.jar An example program must be given as the first argument. Valid program names are: aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files. bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi. dbcount: An example job that count the pageview counts from a database. distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi. grep: A map/reduce program that counts the matches of a regex in the input. join: A job that effects a join over sorted, equally partitioned datasets multifilewc: A job that counts words from several files. pentomino: A map/reduce tile laying program to find solutions to pentomino problems. pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method. randomtextwriter: A map/reduce program that writes 10GB of random textual data per node. randomwriter: A map/reduce program that writes 10GB of random data per node. secondarysort: An example defining a secondary sort to the reduce. sort: A map/reduce program that sorts the data written by the random writer. sudoku: A sudoku solver. teragen: Generate data for the terasort terasort: Run the terasort teravalidate: Checking results of terasort wordcount: A map/reduce program that counts the words in the input files. wordmean: A map/reduce program that counts the average length of the words in the input files. wordmedian: A map/reduce program that counts the median length of the words in the input files. wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files. hadoop@Ubuntu1604:/usr/local/hadoop$
相关文章推荐
- Hadoop - 单机配置 - 使用Hadoop2.8.0和Ubuntu16.04
- Hadoop - 伪分布式配置 - 使用Hadoop2.8.0和Ubuntu16.04
- Hadoop - 伪分布式配置 - 使用Hadoop2.8.0和Ubuntu16.04
- Hadoop安装教程_单机/伪分布式配置_Hadoop2.6.4/Ubuntu16.04
- Hadoop安装教程_单机配置_Hadoop1.2.1/Ubuntu16.04
- Hadoop安装教程_单机/伪分布式配置_Hadoop2.7.3/Ubuntu16.04
- 大数据-ubuntu16.04上Hadoop安装教程_单机配置
- Hadoop安装教程_单机/伪分布式配置_Hadoop2.8.0/Ubuntu16
- Ubuntu下单机伪分布式的hadoop-1.2.1稳定版的配置
- Hadoop单机环境安装配置(Ubuntu 12.04)
- Ubuntu15.04单机/伪分布式安装配置Hadoop与Hive试验机
- 使用配置hadoop中常用的Linux(ubuntu)命令
- Ubuntu14.04下hadoop-2.6.0单机配置和伪分布式配置
- ubuntu上hadoop单机和集群配置学习
- 虚拟机下Linux系统Hadoop单机/伪分布式配置:Hadoop2.5.2+Ubuntu14.04(半原创)
- Ubuntu Hadoop 2.6.0 单机版配置图文详解
- Ubuntu14.04下hadoop-2.6.0单机配置和伪分布式配置
- Hadoop集群配置 - 于Ubuntu上使用VMare
- [转载] Ubuntu14.04下hadoop-2.6.0单机配置和伪分布式配置
- hadoop在ubuntu下单机配置