编译Hadoop source code
2015-08-19 14:52
507 查看
原文:来自Hadoop source code 在 github 或 hadoop 官网下载source包,包内含BUILDING.txt
Build instructions for Hadoop
----------------------------------------------------------------------------------
Requirements:
* Unix System
* JDK 1.7+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac
* Zlib devel (if compiling native code)
* openssl devel (if compiling native hadoop-pipes and to get the best HDFS encryption performance)
* Jansson C XML parsing library (if compiling libwebhdfs)
* Linux FUSE (Filesystem in Userspace) version 2.6 or above (if compiling fuse_dfs)
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)
* python (for releasedocs)
* bats (for shell code testing)
----------------------------------------------------------------------------------
The easiest way to get an environment with all the appropriate tools is by means
of the provided Docker config.
This requires a recent version of docker (1.4.1 and higher are known to work).
On Linux:
Install Docker : https://docs.docker.com/installation/ubuntulinux/
and run this command:
sudo service docker start
$ sudo ./start-build-env.sh //EJ: 这个只有github上 Apache/Hadoop 这个项目中才有这个脚本
On Mac:
First make sure Homebrew has been installed ( http://brew.sh/ )
$ brew install docker boot2docker
$ boot2docker init -m 4096
$ boot2docker start
$ $(boot2docker shellinit)
$ ./start-build-env.sh
The prompt which is then presented is located at a mounted version of the source tree
and all required tools for testing and building have been installed and configured.
Note that from within this docker environment you ONLY have access to the Hadoop source
tree from where you started. So if you need to run
dev-support/test-patch.sh /path/to/my.patch
then the patch must be placed inside the hadoop source tree.
Known issues:
- On Mac with Boot2Docker the performance on the mounted directory is currently extremely slow.
This is a known problem related to boot2docker on the Mac.
See:
https://github.com/boot2docker/boot2docker/issues/593
This issue has been resolved as a duplicate, and they point to a new feature for utilizing NFS mounts
as the proposed solution:
https://github.com/boot2docker/boot2docker/issues/64
An alternative solution to this problem is to install Linux native inside a virtual machine
and run your IDE and Docker etc inside that VM.
----------------------------------------------------------------------------------
// EJ 我们主要从这里开始,这里文档以Ubuntu 14.04作为例子
Installing required packages for clean install of Ubuntu 14.04 LTS Desktop:
// EJ:安装JAVA,安装在 /usr/lib/jvm/ 下
* Oracle JDK 1.7 (preferred)
$ sudo apt-get purge openjdk*
$ sudo apt-get install software-properties-common
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer
* Maven
$ sudo apt-get -y install maven
* Native libraries
$ sudo apt-get -y install build-essential autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev
* ProtocolBuffer 2.5.0 (required)
$ sudo apt-get -y install protobuf-compiler
//EJ: 可选择安装我在第一次安装时并没有安装这几项
Optional packages:
* Snappy compression
$ sudo apt-get install snappy libsnappy-dev
* Bzip2
$ sudo apt-get install bzip2 libbz2-dev
* Jansson (C Library for JSON)
$ sudo apt-get install libjansson-dev
* Linux FUSE
$ sudo apt-get install fuse libfuse-dev
//EJ:以下这段时乱入进来的,我第一次编译失败,查看结果是缺少ant包,于是去网上搜了这篇教程。乱入就放这了。希望以后自己看得懂这里做个注释。
Home > Tutorials > Create a Hadoop Build and Development Environment
Create a Hadoop Build and Development Environment
Author: Vic Hargrave Published: February 24, 2013
Updated: December 26, 2014
Category: Tutorials Tags: CentOS, Hadoop, Java, Linux
Hadoop DevelopmentOne of the first things I had to do when I started working with Hadoop was fix bugs within the Hadoop stack. To be able to work on Hadoop internals requires numerous programming tools and libraries.
If you have a desire or need to work on Hadoop code, I’ve summarized the packages you need to install and configure to create a Hadoop build and development environment.
Contents [show]
Base Operating System
By far the easiest operating system to set up for Hadoop development is a RedHat derived distro. I highly recommend CentOS 6.x – I use CentOS 6.3 64 bit. To limit the scope of this article I’m going to assume you have a 64 bit CentOS system to work with so
I won’t describe the installation procedure here.
Install Oracle JDK 1.6
CentOS normally comes with the OpenJDK Java environment. This is not the version of Java you want to use for Hadoop development. Instead you should install Oracle’s official Java 1.6 JDK and remove OpenJDK. Note you have to run yum as root to be able to install
packages on your system.
Remove OpenJDK.
yum -y remove *jdk*
yum -y remove *java*
Get Oracle’s Java 1.6 JDK. I suggest downloading the rpm.bin version.
Install JDK 1.6 by double clicking on on the rpm.bin package.
Install CentOS Packages
Install the following CentOS packages using the yum commands as shown. Note some of the packages may already be installed.
yum -y install gcc-c++.x86_64
yum -y install make.x86_64
yum -y install openssl.x86_64 openssl-devel.x86_64 openssh.x86_64
yum -y install libtool.x86_64
yum -y install autoconf.noarch automake.noarch
yum -y install cmake.x86_64
yum -y install xz.x86_64 xz-devel.x86_64
yum -y install zlib.x86_64 zlib-devel.x86_64
yum -y install git.x86_64
Install Snappy Libraries
You need to get the snappy libraries from the RPMforge repository. Here is what you do to get the RPMforge repo file and snappy library:
Click here to get the the RPMforge repo file – http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.2-2.el6.rf.x86_64.rpm.
Install the repo file by typing:
rpm -Uvh rpmforge-release-0.5.2.2.el6.rf.x86_64.rpm
Use yum to get the snappy lib:
yum -y install snappy.x86_64 snappy-devel.x86_64
Install Protobuf
Protocol Buffers are used internally by Hadoop for RPC. Install this facility as follows:
Download protobuf-2.4.1.tar.gz.
Unpack and build:
tar zxvf protobuf-2.4.1.tar.gz
cd protobuf-2.4.1
./configure
make
sudo make install
Install Apache and Findbugs Tools
Last but by no means least you’ll need the Findbugs and the Apache development tools: maven, ant and ivy. The CentOS packages of the Apache tools are usually not what you want. That may change in the future but in the meantime follow the instructions here
to obtain and install the latest tools.
Download Apache Maven.
Download Apache Ant.
Download Apache Ivy.
Download Findbugs.
Install each package as follows:
tar zxvf <maven package>.tgz
tar zxvf <ant package>.tgz
tar zxvf <ivy package>.tgz
tar zxvf <findbugs package>.tgz
sudo cp -R <maven directory> /usr/local/apache_maven/
sudo cp -R <ant directory> /usr/local/apache_ant/
sudo cp -R <ivy directory> /usr/local/apache_ivy/
sudo cp -R <findbugs> /usr/local/findbugs/
Set your .bash_profile or .bashrc to include these environment variables:
export FB_HOME=/usr/local/findbugs
export ANT_HOME=/usr/local/apache_ant
export IVY_HOME=/usr/local/apache_ivy
export M2_HOME=/usr/local/apache_maven
export JAVA_HOME=/usr/java/default
PATH=$PATH:$M2_HOME/bin:$IVY_HOME/bin:$ANT_HOME/bin:$FB_HOME/bin::$IDEA_HOME/bin
export PATH
//乱入End
----------------------------------------------------------------------------------
Maven main modules:
hadoop (Main Hadoop project)
- hadoop-project (Parent POM for all Hadoop Maven modules. )
(All plugins & dependencies versions are defined here.)
- hadoop-project-dist (Parent POM for modules that generate distributions.)
- hadoop-annotations (Generates the Hadoop doclet used to generated the Javadocs)
- hadoop-assemblies (Maven assemblies used by the different modules)
- hadoop-common-project (Hadoop Common)
- hadoop-hdfs-project (Hadoop HDFS)
- hadoop-mapreduce-project (Hadoop MapReduce)
- hadoop-tools (Hadoop tools like Streaming, Distcp, etc.)
- hadoop-dist (Hadoop distribution assembler)
----------------------------------------------------------------------------------
Where to run Maven from?
It can be run from any module. The only catch is that if not run from utrunk
all modules that are not part of the build run must be installed in the local
Maven cache or available in a Maven repository.
----------------------------------------------------------------------------------
Maven build goals:
* Clean : mvn clean [-Preleasedocs]
* Compile : mvn compile [-Pnative]
* Run tests : mvn test [-Pnative] [-Pshelltest]
* Create JAR : mvn package
* Run findbugs : mvn compile findbugs:findbugs
* Run checkstyle : mvn compile checkstyle:checkstyle
* Install JAR in M2 cache : mvn install
* Deploy JAR to Maven repo : mvn deploy
* Run clover : mvn test -Pclover [-DcloverLicenseLocation=${user.name}/.clover.license]
* Run Rat : mvn apache-rat:check
* Build javadocs : mvn javadoc:javadoc
* Build distribution : mvn package [-Pdist][-Pdocs][-Psrc][-Pnative][-Dtar][-Preleasedocs]
* Change Hadoop version : mvn versions:set -DnewVersion=NEWVERSION
Build options:
* Use -Pnative to compile/bundle native code
* Use -Pdocs to generate & bundle the documentation in the distribution (using -Pdist)
* Use -Psrc to create a project source TAR.GZ
* Use -Dtar to create a TAR with the distribution (using -Pdist)
* Use -Preleasedocs to include the changelog and release docs (requires Internet connectivity)
Snappy build options:
Snappy is a compression library that can be utilized by the native code.
It is currently an optional component, meaning that Hadoop can be built with
or without this dependency.
* Use -Drequire.snappy to fail the build if libsnappy.so is not found.
If this option is not specified and the snappy library is missing,
we silently build a version of libhadoop.so that cannot make use of snappy.
This option is recommended if you plan on making use of snappy and want
to get more repeatable builds.
* Use -Dsnappy.prefix to specify a nonstandard location for the libsnappy
header files and library files. You do not need this option if you have
installed snappy using a package manager.
* Use -Dsnappy.lib to specify a nonstandard location for the libsnappy library
files. Similarly to snappy.prefix, you do not need this option if you have
installed snappy using a package manager.
* Use -Dbundle.snappy to copy the contents of the snappy.lib directory into
the final tar file. This option requires that -Dsnappy.lib is also given,
and it ignores the -Dsnappy.prefix option.
OpenSSL build options:
OpenSSL includes a crypto library that can be utilized by the native code.
It is currently an optional component, meaning that Hadoop can be built with
or without this dependency.
* Use -Drequire.openssl to fail the build if libcrypto.so is not found.
If this option is not specified and the openssl library is missing,
we silently build a version of libhadoop.so that cannot make use of
openssl. This option is recommended if you plan on making use of openssl
and want to get more repeatable builds.
* Use -Dopenssl.prefix to specify a nonstandard location for the libcrypto
header files and library files. You do not need this option if you have
installed openssl using a package manager.
* Use -Dopenssl.lib to specify a nonstandard location for the libcrypto library
files. Similarly to openssl.prefix, you do not need this option if you have
installed openssl using a package manager.
* Use -Dbundle.openssl to copy the contents of the openssl.lib directory into
the final tar file. This option requires that -Dopenssl.lib is also given,
and it ignores the -Dopenssl.prefix option.
Tests options:
* Use -DskipTests to skip tests when running the following Maven goals:
'package', 'install', 'deploy' or 'verify'
* -Dtest=<TESTCLASSNAME>,<TESTCLASSNAME#METHODNAME>,....
* -Dtest.exclude=<TESTCLASSNAME>
* -Dtest.exclude.pattern=**/<TESTCLASSNAME1>.java,**/<TESTCLASSNAME2>.java
----------------------------------------------------------------------------------
Building components separately
If you are building a submodule directory, all the hadoop dependencies this
submodule has will be resolved as all other 3rd party dependencies. This is,
from the Maven cache or from a Maven repository (if not available in the cache
or the SNAPSHOT 'timed out').
An alternative is to run 'mvn install -DskipTests' from Hadoop source top
level once; and then work from the submodule. Keep in mind that SNAPSHOTs
time out after a while, using the Maven '-nsu' will stop Maven from trying
to update SNAPSHOTs from external repos.
----------------------------------------------------------------------------------
Protocol Buffer compiler
The version of Protocol Buffer compiler, protoc, must match the version of the
protobuf JAR.
If you have multiple versions of protoc in your system, you can set in your
build shell the HADOOP_PROTOC_PATH environment variable to point to the one you
want to use for the Hadoop build. If you don't define this environment variable,
protoc is looked up in the PATH.
----------------------------------------------------------------------------------
Importing projects to eclipse
When you import the project to eclipse, install hadoop-maven-plugins at first.
$ cd hadoop-maven-plugins
$ mvn install
Then, generate eclipse project files.
$ mvn eclipse:eclipse -DskipTests
At last, import to eclipse by specifying the root directory of the project via
[File] > [Import] > [Existing Projects into Workspace].
----------------------------------------------------------------------------------
Building distributions:
Create binary distribution without native code and without documentation:
$ mvn package -Pdist -DskipTests -Dtar
Create binary distribution with native code and with documentation:
$ mvn package -Pdist,native,docs -DskipTests -Dtar
Create source distribution:
$ mvn package -Psrc -DskipTests
Create source and binary distributions with native code and documentation:
$ mvn package -Pdist,native,docs,src -DskipTests -Dtar
Create a local staging version of the website (in /tmp/hadoop-site)
$ mvn clean site -Preleasedocs; mvn site:stage -DstagingDirectory=/tmp/hadoop-site
----------------------------------------------------------------------------------
Installing Hadoop
Look for these HTML files after you build the document by the above commands.
* Single Node Setup:
hadoop-project-dist/hadoop-common/SingleCluster.html
* Cluster Setup:
hadoop-project-dist/hadoop-common/ClusterSetup.html
----------------------------------------------------------------------------------
Handling out of memory errors in builds
----------------------------------------------------------------------------------
If the build process fails with an out of memory error, you should be able to fix
it by increasing the memory used by maven which can be done via the environment
variable MAVEN_OPTS.
Here is an example setting to allocate between 256 and 512 MB of heap space to
Maven
export MAVEN_OPTS="-Xms256m -Xmx512m"
----------------------------------------------------------------------------------
Building on Windows
----------------------------------------------------------------------------------
Requirements:
* Windows System
* JDK 1.7+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer
* Windows SDK 7.1 or Visual Studio 2010 Professional
* Windows SDK 8.1 (if building CPU rate control for the container executor)
* zlib headers (if building native code bindings for zlib)
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)
* Unix command-line tools from GnuWin32: sh, mkdir, rm, cp, tar, gzip. These
tools must be present on your PATH.
* Python ( for generation of docs using 'mvn site')
Unix command-line tools are also included with the Windows Git package which
can be downloaded from http://git-scm.com/downloads
If using Visual Studio, it must be Visual Studio 2010 Professional (not 2012).
Do not use Visual Studio Express. It does not support compiling for 64-bit,
which is problematic if running a 64-bit system. The Windows SDK 7.1 is free to
download here:
http://www.microsoft.com/en-us/download/details.aspx?id=8279
The Windows SDK 8.1 is available to download at:
http://msdn.microsoft.com/en-us/windows/bg162891.aspx
Cygwin is neither required nor supported.
----------------------------------------------------------------------------------
Building:
Keep the source code tree in a short path to avoid running into problems related
to Windows maximum path length limitation (for example, C:\hdc).
Run builds from a Windows SDK Command Prompt. (Start, All Programs,
Microsoft Windows SDK v7.1, Windows SDK 7.1 Command Prompt).
JAVA_HOME must be set, and the path must not contain spaces. If the full path
would contain spaces, then use the Windows short path instead.
You must set the Platform environment variable to either x64 or Win32 depending
on whether you're running a 64-bit or 32-bit system. Note that this is
case-sensitive. It must be "Platform", not "PLATFORM" or "platform".
Environment variables on Windows are usually case-insensitive, but Maven treats
them as case-sensitive. Failure to set this environment variable correctly will
cause msbuild to fail while building the native code in hadoop-common.
set Platform=x64 (when building on a 64-bit system)
set Platform=Win32 (when building on a 32-bit system)
Several tests require that the user must have the Create Symbolic Links
privilege.
All Maven goals are the same as described above with the exception that
native code is built by enabling the 'native-win' Maven profile. -Pnative-win
is enabled by default when building on Windows since the native components
are required (not optional) on Windows.
If native code bindings for zlib are required, then the zlib headers must be
deployed on the build machine. Set the ZLIB_HOME environment variable to the
directory containing the headers.
set ZLIB_HOME=C:\zlib-1.2.7
At runtime, zlib1.dll must be accessible on the PATH. Hadoop has been tested
with zlib 1.2.7, built using Visual Studio 2010 out of contrib\vstudio\vc10 in
the zlib 1.2.7 source tree.
http://www.zlib.net/
----------------------------------------------------------------------------------
Building distributions:
* Build distribution with native code : mvn package [-Pdist][-Pdocs][-Psrc][-Dtar]
—————————————————————————————————————————————————————————————————————————————————
EJ:即使按照上文编译hadoop源码成功了,在import进eclipse中还是会看到很多errors,一些简单的在 http://wiki.apache.org/hadoop/EclipseEnvironment 这个wiki
就可以解决。以下是我碰到的一些比较难解决的问题。当你发觉自己eclipse显示56~59 errors 的时候,是时候来这里看看了。
这里一部分问题时关于编译时 mvn install -DSkipTests 的,按照教程,这当中会自动生成一些文件,但当我操作的时候并未出现,卡了我好久,同样被卡的同学可以来看看
EJ Hadoop_src Notes
Solve Error of importing hadoop to eclipse
Error1: org.apache.hadoop.ipc.protobuf cannot be resolved
Solution:
$ cd hadoop-2.5.2-src/hadoop-common-project/hadoop-common/src/test/proto
$ protoc --java_out=../java *.proto
Error2: AvroRecord cannot be resolved to a type TestAv
8ed3
roSerialization.java
Solution:
1. download avro-tools-1.7.7.jar put it in hadoop-2.x-src/
$ cd hadoop-2.5.2-src/hadoop-common-project/hadoop-common/src/test/avro
$ java -jar ~/hadoop-2.5.2-src/avro-tools-1.7.7.jar compile schema avroRecord.avsc ../java
Error2: Project ‘hadoop-streaming’ is missing required source … Build Path Problem
Solution:
right click hadoop-streaming->properties->Java Build Path->Source->remove error items
Build instructions for Hadoop
----------------------------------------------------------------------------------
Requirements:
* Unix System
* JDK 1.7+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac
* Zlib devel (if compiling native code)
* openssl devel (if compiling native hadoop-pipes and to get the best HDFS encryption performance)
* Jansson C XML parsing library (if compiling libwebhdfs)
* Linux FUSE (Filesystem in Userspace) version 2.6 or above (if compiling fuse_dfs)
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)
* python (for releasedocs)
* bats (for shell code testing)
----------------------------------------------------------------------------------
The easiest way to get an environment with all the appropriate tools is by means
of the provided Docker config.
This requires a recent version of docker (1.4.1 and higher are known to work).
On Linux:
Install Docker : https://docs.docker.com/installation/ubuntulinux/
and run this command:
sudo service docker start
$ sudo ./start-build-env.sh //EJ: 这个只有github上 Apache/Hadoop 这个项目中才有这个脚本
On Mac:
First make sure Homebrew has been installed ( http://brew.sh/ )
$ brew install docker boot2docker
$ boot2docker init -m 4096
$ boot2docker start
$ $(boot2docker shellinit)
$ ./start-build-env.sh
The prompt which is then presented is located at a mounted version of the source tree
and all required tools for testing and building have been installed and configured.
Note that from within this docker environment you ONLY have access to the Hadoop source
tree from where you started. So if you need to run
dev-support/test-patch.sh /path/to/my.patch
then the patch must be placed inside the hadoop source tree.
Known issues:
- On Mac with Boot2Docker the performance on the mounted directory is currently extremely slow.
This is a known problem related to boot2docker on the Mac.
See:
https://github.com/boot2docker/boot2docker/issues/593
This issue has been resolved as a duplicate, and they point to a new feature for utilizing NFS mounts
as the proposed solution:
https://github.com/boot2docker/boot2docker/issues/64
An alternative solution to this problem is to install Linux native inside a virtual machine
and run your IDE and Docker etc inside that VM.
----------------------------------------------------------------------------------
// EJ 我们主要从这里开始,这里文档以Ubuntu 14.04作为例子
Installing required packages for clean install of Ubuntu 14.04 LTS Desktop:
// EJ:安装JAVA,安装在 /usr/lib/jvm/ 下
* Oracle JDK 1.7 (preferred)
$ sudo apt-get purge openjdk*
$ sudo apt-get install software-properties-common
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer
* Maven
$ sudo apt-get -y install maven
* Native libraries
$ sudo apt-get -y install build-essential autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev
* ProtocolBuffer 2.5.0 (required)
$ sudo apt-get -y install protobuf-compiler
//EJ: 可选择安装我在第一次安装时并没有安装这几项
Optional packages:
* Snappy compression
$ sudo apt-get install snappy libsnappy-dev
* Bzip2
$ sudo apt-get install bzip2 libbz2-dev
* Jansson (C Library for JSON)
$ sudo apt-get install libjansson-dev
* Linux FUSE
$ sudo apt-get install fuse libfuse-dev
//EJ:以下这段时乱入进来的,我第一次编译失败,查看结果是缺少ant包,于是去网上搜了这篇教程。乱入就放这了。希望以后自己看得懂这里做个注释。
Home > Tutorials > Create a Hadoop Build and Development Environment
Create a Hadoop Build and Development Environment
Author: Vic Hargrave Published: February 24, 2013
Updated: December 26, 2014
Category: Tutorials Tags: CentOS, Hadoop, Java, Linux
Hadoop DevelopmentOne of the first things I had to do when I started working with Hadoop was fix bugs within the Hadoop stack. To be able to work on Hadoop internals requires numerous programming tools and libraries.
If you have a desire or need to work on Hadoop code, I’ve summarized the packages you need to install and configure to create a Hadoop build and development environment.
Contents [show]
Base Operating System
By far the easiest operating system to set up for Hadoop development is a RedHat derived distro. I highly recommend CentOS 6.x – I use CentOS 6.3 64 bit. To limit the scope of this article I’m going to assume you have a 64 bit CentOS system to work with so
I won’t describe the installation procedure here.
Install Oracle JDK 1.6
CentOS normally comes with the OpenJDK Java environment. This is not the version of Java you want to use for Hadoop development. Instead you should install Oracle’s official Java 1.6 JDK and remove OpenJDK. Note you have to run yum as root to be able to install
packages on your system.
Remove OpenJDK.
yum -y remove *jdk*
yum -y remove *java*
Get Oracle’s Java 1.6 JDK. I suggest downloading the rpm.bin version.
Install JDK 1.6 by double clicking on on the rpm.bin package.
Install CentOS Packages
Install the following CentOS packages using the yum commands as shown. Note some of the packages may already be installed.
yum -y install gcc-c++.x86_64
yum -y install make.x86_64
yum -y install openssl.x86_64 openssl-devel.x86_64 openssh.x86_64
yum -y install libtool.x86_64
yum -y install autoconf.noarch automake.noarch
yum -y install cmake.x86_64
yum -y install xz.x86_64 xz-devel.x86_64
yum -y install zlib.x86_64 zlib-devel.x86_64
yum -y install git.x86_64
Install Snappy Libraries
You need to get the snappy libraries from the RPMforge repository. Here is what you do to get the RPMforge repo file and snappy library:
Click here to get the the RPMforge repo file – http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.2-2.el6.rf.x86_64.rpm.
Install the repo file by typing:
rpm -Uvh rpmforge-release-0.5.2.2.el6.rf.x86_64.rpm
Use yum to get the snappy lib:
yum -y install snappy.x86_64 snappy-devel.x86_64
Install Protobuf
Protocol Buffers are used internally by Hadoop for RPC. Install this facility as follows:
Download protobuf-2.4.1.tar.gz.
Unpack and build:
tar zxvf protobuf-2.4.1.tar.gz
cd protobuf-2.4.1
./configure
make
sudo make install
Install Apache and Findbugs Tools
Last but by no means least you’ll need the Findbugs and the Apache development tools: maven, ant and ivy. The CentOS packages of the Apache tools are usually not what you want. That may change in the future but in the meantime follow the instructions here
to obtain and install the latest tools.
Download Apache Maven.
Download Apache Ant.
Download Apache Ivy.
Download Findbugs.
Install each package as follows:
tar zxvf <maven package>.tgz
tar zxvf <ant package>.tgz
tar zxvf <ivy package>.tgz
tar zxvf <findbugs package>.tgz
sudo cp -R <maven directory> /usr/local/apache_maven/
sudo cp -R <ant directory> /usr/local/apache_ant/
sudo cp -R <ivy directory> /usr/local/apache_ivy/
sudo cp -R <findbugs> /usr/local/findbugs/
Set your .bash_profile or .bashrc to include these environment variables:
export FB_HOME=/usr/local/findbugs
export ANT_HOME=/usr/local/apache_ant
export IVY_HOME=/usr/local/apache_ivy
export M2_HOME=/usr/local/apache_maven
export JAVA_HOME=/usr/java/default
PATH=$PATH:$M2_HOME/bin:$IVY_HOME/bin:$ANT_HOME/bin:$FB_HOME/bin::$IDEA_HOME/bin
export PATH
//乱入End
----------------------------------------------------------------------------------
Maven main modules:
hadoop (Main Hadoop project)
- hadoop-project (Parent POM for all Hadoop Maven modules. )
(All plugins & dependencies versions are defined here.)
- hadoop-project-dist (Parent POM for modules that generate distributions.)
- hadoop-annotations (Generates the Hadoop doclet used to generated the Javadocs)
- hadoop-assemblies (Maven assemblies used by the different modules)
- hadoop-common-project (Hadoop Common)
- hadoop-hdfs-project (Hadoop HDFS)
- hadoop-mapreduce-project (Hadoop MapReduce)
- hadoop-tools (Hadoop tools like Streaming, Distcp, etc.)
- hadoop-dist (Hadoop distribution assembler)
----------------------------------------------------------------------------------
Where to run Maven from?
It can be run from any module. The only catch is that if not run from utrunk
all modules that are not part of the build run must be installed in the local
Maven cache or available in a Maven repository.
----------------------------------------------------------------------------------
Maven build goals:
* Clean : mvn clean [-Preleasedocs]
* Compile : mvn compile [-Pnative]
* Run tests : mvn test [-Pnative] [-Pshelltest]
* Create JAR : mvn package
* Run findbugs : mvn compile findbugs:findbugs
* Run checkstyle : mvn compile checkstyle:checkstyle
* Install JAR in M2 cache : mvn install
* Deploy JAR to Maven repo : mvn deploy
* Run clover : mvn test -Pclover [-DcloverLicenseLocation=${user.name}/.clover.license]
* Run Rat : mvn apache-rat:check
* Build javadocs : mvn javadoc:javadoc
* Build distribution : mvn package [-Pdist][-Pdocs][-Psrc][-Pnative][-Dtar][-Preleasedocs]
* Change Hadoop version : mvn versions:set -DnewVersion=NEWVERSION
Build options:
* Use -Pnative to compile/bundle native code
* Use -Pdocs to generate & bundle the documentation in the distribution (using -Pdist)
* Use -Psrc to create a project source TAR.GZ
* Use -Dtar to create a TAR with the distribution (using -Pdist)
* Use -Preleasedocs to include the changelog and release docs (requires Internet connectivity)
Snappy build options:
Snappy is a compression library that can be utilized by the native code.
It is currently an optional component, meaning that Hadoop can be built with
or without this dependency.
* Use -Drequire.snappy to fail the build if libsnappy.so is not found.
If this option is not specified and the snappy library is missing,
we silently build a version of libhadoop.so that cannot make use of snappy.
This option is recommended if you plan on making use of snappy and want
to get more repeatable builds.
* Use -Dsnappy.prefix to specify a nonstandard location for the libsnappy
header files and library files. You do not need this option if you have
installed snappy using a package manager.
* Use -Dsnappy.lib to specify a nonstandard location for the libsnappy library
files. Similarly to snappy.prefix, you do not need this option if you have
installed snappy using a package manager.
* Use -Dbundle.snappy to copy the contents of the snappy.lib directory into
the final tar file. This option requires that -Dsnappy.lib is also given,
and it ignores the -Dsnappy.prefix option.
OpenSSL build options:
OpenSSL includes a crypto library that can be utilized by the native code.
It is currently an optional component, meaning that Hadoop can be built with
or without this dependency.
* Use -Drequire.openssl to fail the build if libcrypto.so is not found.
If this option is not specified and the openssl library is missing,
we silently build a version of libhadoop.so that cannot make use of
openssl. This option is recommended if you plan on making use of openssl
and want to get more repeatable builds.
* Use -Dopenssl.prefix to specify a nonstandard location for the libcrypto
header files and library files. You do not need this option if you have
installed openssl using a package manager.
* Use -Dopenssl.lib to specify a nonstandard location for the libcrypto library
files. Similarly to openssl.prefix, you do not need this option if you have
installed openssl using a package manager.
* Use -Dbundle.openssl to copy the contents of the openssl.lib directory into
the final tar file. This option requires that -Dopenssl.lib is also given,
and it ignores the -Dopenssl.prefix option.
Tests options:
* Use -DskipTests to skip tests when running the following Maven goals:
'package', 'install', 'deploy' or 'verify'
* -Dtest=<TESTCLASSNAME>,<TESTCLASSNAME#METHODNAME>,....
* -Dtest.exclude=<TESTCLASSNAME>
* -Dtest.exclude.pattern=**/<TESTCLASSNAME1>.java,**/<TESTCLASSNAME2>.java
----------------------------------------------------------------------------------
Building components separately
If you are building a submodule directory, all the hadoop dependencies this
submodule has will be resolved as all other 3rd party dependencies. This is,
from the Maven cache or from a Maven repository (if not available in the cache
or the SNAPSHOT 'timed out').
An alternative is to run 'mvn install -DskipTests' from Hadoop source top
level once; and then work from the submodule. Keep in mind that SNAPSHOTs
time out after a while, using the Maven '-nsu' will stop Maven from trying
to update SNAPSHOTs from external repos.
----------------------------------------------------------------------------------
Protocol Buffer compiler
The version of Protocol Buffer compiler, protoc, must match the version of the
protobuf JAR.
If you have multiple versions of protoc in your system, you can set in your
build shell the HADOOP_PROTOC_PATH environment variable to point to the one you
want to use for the Hadoop build. If you don't define this environment variable,
protoc is looked up in the PATH.
----------------------------------------------------------------------------------
Importing projects to eclipse
When you import the project to eclipse, install hadoop-maven-plugins at first.
$ cd hadoop-maven-plugins
$ mvn install
Then, generate eclipse project files.
$ mvn eclipse:eclipse -DskipTests
At last, import to eclipse by specifying the root directory of the project via
[File] > [Import] > [Existing Projects into Workspace].
----------------------------------------------------------------------------------
Building distributions:
Create binary distribution without native code and without documentation:
$ mvn package -Pdist -DskipTests -Dtar
Create binary distribution with native code and with documentation:
$ mvn package -Pdist,native,docs -DskipTests -Dtar
Create source distribution:
$ mvn package -Psrc -DskipTests
Create source and binary distributions with native code and documentation:
$ mvn package -Pdist,native,docs,src -DskipTests -Dtar
Create a local staging version of the website (in /tmp/hadoop-site)
$ mvn clean site -Preleasedocs; mvn site:stage -DstagingDirectory=/tmp/hadoop-site
----------------------------------------------------------------------------------
Installing Hadoop
Look for these HTML files after you build the document by the above commands.
* Single Node Setup:
hadoop-project-dist/hadoop-common/SingleCluster.html
* Cluster Setup:
hadoop-project-dist/hadoop-common/ClusterSetup.html
----------------------------------------------------------------------------------
Handling out of memory errors in builds
----------------------------------------------------------------------------------
If the build process fails with an out of memory error, you should be able to fix
it by increasing the memory used by maven which can be done via the environment
variable MAVEN_OPTS.
Here is an example setting to allocate between 256 and 512 MB of heap space to
Maven
export MAVEN_OPTS="-Xms256m -Xmx512m"
----------------------------------------------------------------------------------
Building on Windows
----------------------------------------------------------------------------------
Requirements:
* Windows System
* JDK 1.7+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer
* Windows SDK 7.1 or Visual Studio 2010 Professional
* Windows SDK 8.1 (if building CPU rate control for the container executor)
* zlib headers (if building native code bindings for zlib)
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)
* Unix command-line tools from GnuWin32: sh, mkdir, rm, cp, tar, gzip. These
tools must be present on your PATH.
* Python ( for generation of docs using 'mvn site')
Unix command-line tools are also included with the Windows Git package which
can be downloaded from http://git-scm.com/downloads
If using Visual Studio, it must be Visual Studio 2010 Professional (not 2012).
Do not use Visual Studio Express. It does not support compiling for 64-bit,
which is problematic if running a 64-bit system. The Windows SDK 7.1 is free to
download here:
http://www.microsoft.com/en-us/download/details.aspx?id=8279
The Windows SDK 8.1 is available to download at:
http://msdn.microsoft.com/en-us/windows/bg162891.aspx
Cygwin is neither required nor supported.
----------------------------------------------------------------------------------
Building:
Keep the source code tree in a short path to avoid running into problems related
to Windows maximum path length limitation (for example, C:\hdc).
Run builds from a Windows SDK Command Prompt. (Start, All Programs,
Microsoft Windows SDK v7.1, Windows SDK 7.1 Command Prompt).
JAVA_HOME must be set, and the path must not contain spaces. If the full path
would contain spaces, then use the Windows short path instead.
You must set the Platform environment variable to either x64 or Win32 depending
on whether you're running a 64-bit or 32-bit system. Note that this is
case-sensitive. It must be "Platform", not "PLATFORM" or "platform".
Environment variables on Windows are usually case-insensitive, but Maven treats
them as case-sensitive. Failure to set this environment variable correctly will
cause msbuild to fail while building the native code in hadoop-common.
set Platform=x64 (when building on a 64-bit system)
set Platform=Win32 (when building on a 32-bit system)
Several tests require that the user must have the Create Symbolic Links
privilege.
All Maven goals are the same as described above with the exception that
native code is built by enabling the 'native-win' Maven profile. -Pnative-win
is enabled by default when building on Windows since the native components
are required (not optional) on Windows.
If native code bindings for zlib are required, then the zlib headers must be
deployed on the build machine. Set the ZLIB_HOME environment variable to the
directory containing the headers.
set ZLIB_HOME=C:\zlib-1.2.7
At runtime, zlib1.dll must be accessible on the PATH. Hadoop has been tested
with zlib 1.2.7, built using Visual Studio 2010 out of contrib\vstudio\vc10 in
the zlib 1.2.7 source tree.
http://www.zlib.net/
----------------------------------------------------------------------------------
Building distributions:
* Build distribution with native code : mvn package [-Pdist][-Pdocs][-Psrc][-Dtar]
—————————————————————————————————————————————————————————————————————————————————
EJ:即使按照上文编译hadoop源码成功了,在import进eclipse中还是会看到很多errors,一些简单的在 http://wiki.apache.org/hadoop/EclipseEnvironment 这个wiki
就可以解决。以下是我碰到的一些比较难解决的问题。当你发觉自己eclipse显示56~59 errors 的时候,是时候来这里看看了。
这里一部分问题时关于编译时 mvn install -DSkipTests 的,按照教程,这当中会自动生成一些文件,但当我操作的时候并未出现,卡了我好久,同样被卡的同学可以来看看
EJ Hadoop_src Notes
Solve Error of importing hadoop to eclipse
Error1: org.apache.hadoop.ipc.protobuf cannot be resolved
Solution:
$ cd hadoop-2.5.2-src/hadoop-common-project/hadoop-common/src/test/proto
$ protoc --java_out=../java *.proto
Error2: AvroRecord cannot be resolved to a type TestAv
8ed3
roSerialization.java
Solution:
1. download avro-tools-1.7.7.jar put it in hadoop-2.x-src/
$ cd hadoop-2.5.2-src/hadoop-common-project/hadoop-common/src/test/avro
$ java -jar ~/hadoop-2.5.2-src/avro-tools-1.7.7.jar compile schema avroRecord.avsc ../java
Error2: Project ‘hadoop-streaming’ is missing required source … Build Path Problem
Solution:
right click hadoop-streaming->properties->Java Build Path->Source->remove error items
相关文章推荐
- 详解HDFS Short Circuit Local Reads
- Hadoop_2.1.0 MapReduce序列图
- 使用Hadoop搭建现代电信企业架构
- 单机版搭建Hadoop环境图文教程详解
- hadoop常见错误以及处理方法详解
- hadoop 单机安装配置教程
- hadoop的hdfs文件操作实现上传文件到hdfs
- hadoop实现grep示例分享
- Apache Hadoop版本详解
- linux下搭建hadoop环境步骤分享
- hadoop client与datanode的通信协议分析
- hadoop中一些常用的命令介绍
- Hadoop单机版和全分布式(集群)安装
- 用PHP和Shell写Hadoop的MapReduce程序
- hadoop map-reduce中的文件并发操作
- Hadoop1.2中配置伪分布式的实例
- java结合HADOOP集群文件上传下载
- 用python + hadoop streaming 分布式编程(一) -- 原理介绍,样例程序与本地调试
- Hadoop安装感悟
- hadoop安装lzo