您的位置:首页 > 运维架构

编译Hadoop source code

2015-08-19 14:52 507 查看
原文:来自Hadoop source code 在 github 或 hadoop 官网下载source包,包内含BUILDING.txt

Build instructions for Hadoop

----------------------------------------------------------------------------------

Requirements:

* Unix System

* JDK 1.7+

* Maven 3.0 or later

* Findbugs 1.3.9 (if running findbugs)

* ProtocolBuffer 2.5.0

* CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac

* Zlib devel (if compiling native code)

* openssl devel (if compiling native hadoop-pipes and to get the best HDFS encryption performance)

* Jansson C XML parsing library (if compiling libwebhdfs)

* Linux FUSE (Filesystem in Userspace) version 2.6 or above (if compiling fuse_dfs)

* Internet connection for first build (to fetch all Maven and Hadoop dependencies)

* python (for releasedocs)

* bats (for shell code testing)

----------------------------------------------------------------------------------

The easiest way to get an environment with all the appropriate tools is by means

of the provided Docker config.

This requires a recent version of docker (1.4.1 and higher are known to work).

On Linux:

    Install Docker : https://docs.docker.com/installation/ubuntulinux/
    and run this command:
sudo service docker start

    

        $ sudo ./start-build-env.sh                               //EJ: 这个只有github上 Apache/Hadoop 这个项目中才有这个脚本

On Mac:

    First make sure Homebrew has been installed ( http://brew.sh/ )

    $ brew install docker boot2docker

    $ boot2docker init -m 4096

    $ boot2docker start

    $ $(boot2docker shellinit)

    $ ./start-build-env.sh

The prompt which is then presented is located at a mounted version of the source tree

and all required tools for testing and building have been installed and configured.

Note that from within this docker environment you ONLY have access to the Hadoop source

tree from where you started. So if you need to run

    dev-support/test-patch.sh /path/to/my.patch

then the patch must be placed inside the hadoop source tree.

Known issues:

- On Mac with Boot2Docker the performance on the mounted directory is currently extremely slow.

  This is a known problem related to boot2docker on the Mac.

  See:

    https://github.com/boot2docker/boot2docker/issues/593
  This issue has been resolved as a duplicate, and they point to a new feature for utilizing NFS mounts

  as the proposed solution:

    https://github.com/boot2docker/boot2docker/issues/64
  An alternative solution to this problem is to install Linux native inside a virtual machine

  and run your IDE and Docker etc inside that VM.

----------------------------------------------------------------------------------
// EJ 我们主要从这里开始,这里文档以Ubuntu 14.04作为例子

Installing required packages for clean install of Ubuntu 14.04 LTS Desktop:

// EJ:安装JAVA,安装在 /usr/lib/jvm/  下

* Oracle JDK 1.7 (preferred)

  $ sudo apt-get purge openjdk*

  $ sudo apt-get install software-properties-common

  $ sudo add-apt-repository ppa:webupd8team/java

  $ sudo apt-get update

  $ sudo apt-get install oracle-java7-installer

* Maven

  $ sudo apt-get -y install maven

* Native libraries

  $ sudo apt-get -y install build-essential autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev

* ProtocolBuffer 2.5.0 (required)

  $ sudo apt-get -y install protobuf-compiler

//EJ: 可选择安装我在第一次安装时并没有安装这几项

Optional packages:

* Snappy compression

  $ sudo apt-get install snappy libsnappy-dev

* Bzip2

  $ sudo apt-get install bzip2 libbz2-dev

* Jansson (C Library for JSON)

  $ sudo apt-get install libjansson-dev

* Linux FUSE

  $ sudo apt-get install fuse libfuse-dev

//EJ:以下这段时乱入进来的,我第一次编译失败,查看结果是缺少ant包,于是去网上搜了这篇教程。乱入就放这了。希望以后自己看得懂这里做个注释。

Home > Tutorials > Create a Hadoop Build and Development Environment

Create a Hadoop Build and Development Environment

    Author: Vic Hargrave Published: February 24, 2013

    Updated: December 26, 2014

    Category: Tutorials Tags: CentOS, Hadoop, Java, Linux 

Hadoop DevelopmentOne of the first things I had to do when I started working with Hadoop was fix bugs within the Hadoop stack. To be able to work on Hadoop internals requires numerous programming tools and libraries.

If you have a desire or need to work on Hadoop code, I’ve summarized the packages you need to install and configure to create a Hadoop build and development environment.

Contents [show]

Base Operating System

By far the easiest operating system to set up for Hadoop development is a RedHat derived distro. I highly recommend CentOS 6.x – I use CentOS 6.3 64 bit. To limit the scope of this article I’m going to assume you have a 64 bit CentOS system to work with so
I won’t describe the installation procedure here.

Install Oracle JDK 1.6

CentOS normally comes with the OpenJDK Java environment. This is not the version of Java you want to use for Hadoop development. Instead you should install Oracle’s official Java 1.6 JDK and remove OpenJDK. Note you have to run yum as root to be able to install
packages on your system.

    Remove OpenJDK.

    yum -y remove *jdk*

    yum -y remove *java*

    Get Oracle’s Java 1.6 JDK. I suggest downloading the rpm.bin version.

    Install JDK 1.6 by double clicking on on the rpm.bin package.

Install CentOS Packages

Install the following CentOS packages using the yum commands as shown. Note some of the packages may already be installed.

yum -y install gcc-c++.x86_64

yum -y install make.x86_64

yum -y install openssl.x86_64 openssl-devel.x86_64 openssh.x86_64

yum -y install libtool.x86_64

yum -y install autoconf.noarch automake.noarch

yum -y install cmake.x86_64

yum -y install xz.x86_64 xz-devel.x86_64 

yum -y install zlib.x86_64 zlib-devel.x86_64

yum -y install git.x86_64

Install Snappy Libraries

You need to get the snappy libraries from the RPMforge repository. Here is what you do to get the RPMforge repo file and snappy library:

    Click here to get the the RPMforge repo file – http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.2-2.el6.rf.x86_64.rpm.
    Install the repo file by typing:

    rpm -Uvh rpmforge-release-0.5.2.2.el6.rf.x86_64.rpm

    Use yum to get the snappy lib:

    yum -y install snappy.x86_64 snappy-devel.x86_64

Install Protobuf

Protocol Buffers are used internally by Hadoop for RPC.  Install this facility as follows:

    Download protobuf-2.4.1.tar.gz.

    Unpack and build:

    tar zxvf protobuf-2.4.1.tar.gz

    cd protobuf-2.4.1

    ./configure

    make

    sudo make install

Install Apache and Findbugs Tools

Last but by no means least you’ll need the Findbugs and the Apache development tools: maven, ant and ivy. The CentOS packages of the Apache tools are usually not what you want.  That may change in the future but in the meantime follow the instructions here
to obtain  and install the latest tools.

    Download Apache Maven.

    Download Apache Ant.

    Download Apache Ivy.
    Download Findbugs.

    Install each package as follows:

    tar zxvf <maven package>.tgz

    tar zxvf <ant package>.tgz

    tar zxvf <ivy package>.tgz

    tar zxvf <findbugs package>.tgz

    sudo cp -R <maven directory> /usr/local/apache_maven/

    sudo cp -R <ant directory> /usr/local/apache_ant/

    sudo cp -R <ivy directory> /usr/local/apache_ivy/

    sudo cp -R <findbugs> /usr/local/findbugs/

    Set your .bash_profile or .bashrc to include these environment variables:

    export FB_HOME=/usr/local/findbugs

    export ANT_HOME=/usr/local/apache_ant

    export IVY_HOME=/usr/local/apache_ivy

    export M2_HOME=/usr/local/apache_maven

     

    export JAVA_HOME=/usr/java/default

     

    PATH=$PATH:$M2_HOME/bin:$IVY_HOME/bin:$ANT_HOME/bin:$FB_HOME/bin::$IDEA_HOME/bin

     

    export PATH

//乱入End

----------------------------------------------------------------------------------

Maven main modules:

  hadoop                            (Main Hadoop project)

         - hadoop-project           (Parent POM for all Hadoop Maven modules.             )

                                    (All plugins & dependencies versions are defined here.)

         - hadoop-project-dist      (Parent POM for modules that generate distributions.)

         - hadoop-annotations       (Generates the Hadoop doclet used to generated the Javadocs)

         - hadoop-assemblies        (Maven assemblies used by the different modules)

         - hadoop-common-project    (Hadoop Common)

         - hadoop-hdfs-project      (Hadoop HDFS)

         - hadoop-mapreduce-project (Hadoop MapReduce)

         - hadoop-tools             (Hadoop tools like Streaming, Distcp, etc.)

         - hadoop-dist              (Hadoop distribution assembler)

----------------------------------------------------------------------------------

Where to run Maven from?

  It can be run from any module. The only catch is that if not run from utrunk

  all modules that are not part of the build run must be installed in the local

  Maven cache or available in a Maven repository.

----------------------------------------------------------------------------------

Maven build goals:

 * Clean                     : mvn clean [-Preleasedocs]

 * Compile                   : mvn compile [-Pnative]

 * Run tests                 : mvn test [-Pnative] [-Pshelltest]

 * Create JAR                : mvn package

 * Run findbugs              : mvn compile findbugs:findbugs

 * Run checkstyle            : mvn compile checkstyle:checkstyle

 * Install JAR in M2 cache   : mvn install

 * Deploy JAR to Maven repo  : mvn deploy

 * Run clover                : mvn test -Pclover [-DcloverLicenseLocation=${user.name}/.clover.license]

 * Run Rat                   : mvn apache-rat:check

 * Build javadocs            : mvn javadoc:javadoc

 * Build distribution        : mvn package [-Pdist][-Pdocs][-Psrc][-Pnative][-Dtar][-Preleasedocs]

 * Change Hadoop version     : mvn versions:set -DnewVersion=NEWVERSION

 Build options:

  * Use -Pnative to compile/bundle native code

  * Use -Pdocs to generate & bundle the documentation in the distribution (using -Pdist)

  * Use -Psrc to create a project source TAR.GZ

  * Use -Dtar to create a TAR with the distribution (using -Pdist)

  * Use -Preleasedocs to include the changelog and release docs (requires Internet connectivity)

 Snappy build options:

   Snappy is a compression library that can be utilized by the native code.

   It is currently an optional component, meaning that Hadoop can be built with

   or without this dependency.

  * Use -Drequire.snappy to fail the build if libsnappy.so is not found.

    If this option is not specified and the snappy library is missing,

    we silently build a version of libhadoop.so that cannot make use of snappy.

    This option is recommended if you plan on making use of snappy and want

    to get more repeatable builds.

  * Use -Dsnappy.prefix to specify a nonstandard location for the libsnappy

    header files and library files. You do not need this option if you have

    installed snappy using a package manager.

  * Use -Dsnappy.lib to specify a nonstandard location for the libsnappy library

    files.  Similarly to snappy.prefix, you do not need this option if you have

    installed snappy using a package manager.

  * Use -Dbundle.snappy to copy the contents of the snappy.lib directory into

    the final tar file. This option requires that -Dsnappy.lib is also given,

    and it ignores the -Dsnappy.prefix option.

 OpenSSL build options:

   OpenSSL includes a crypto library that can be utilized by the native code.

   It is currently an optional component, meaning that Hadoop can be built with

   or without this dependency.

  * Use -Drequire.openssl to fail the build if libcrypto.so is not found.

    If this option is not specified and the openssl library is missing,

    we silently build a version of libhadoop.so that cannot make use of

    openssl. This option is recommended if you plan on making use of openssl

    and want to get more repeatable builds.

  * Use -Dopenssl.prefix to specify a nonstandard location for the libcrypto

    header files and library files. You do not need this option if you have

    installed openssl using a package manager.

  * Use -Dopenssl.lib to specify a nonstandard location for the libcrypto library

    files. Similarly to openssl.prefix, you do not need this option if you have

    installed openssl using a package manager.

  * Use -Dbundle.openssl to copy the contents of the openssl.lib directory into

    the final tar file. This option requires that -Dopenssl.lib is also given,

    and it ignores the -Dopenssl.prefix option.

   Tests options:

  * Use -DskipTests to skip tests when running the following Maven goals:

    'package',  'install', 'deploy' or 'verify'

  * -Dtest=<TESTCLASSNAME>,<TESTCLASSNAME#METHODNAME>,....

  * -Dtest.exclude=<TESTCLASSNAME>

  * -Dtest.exclude.pattern=**/<TESTCLASSNAME1>.java,**/<TESTCLASSNAME2>.java

----------------------------------------------------------------------------------

Building components separately

If you are building a submodule directory, all the hadoop dependencies this

submodule has will be resolved as all other 3rd party dependencies. This is,

from the Maven cache or from a Maven repository (if not available in the cache

or the SNAPSHOT 'timed out').

An alternative is to run 'mvn install -DskipTests' from Hadoop source top

level once; and then work from the submodule. Keep in mind that SNAPSHOTs

time out after a while, using the Maven '-nsu' will stop Maven from trying

to update SNAPSHOTs from external repos.

----------------------------------------------------------------------------------

Protocol Buffer compiler

The version of Protocol Buffer compiler, protoc, must match the version of the

protobuf JAR.

If you have multiple versions of protoc in your system, you can set in your

build shell the HADOOP_PROTOC_PATH environment variable to point to the one you

want to use for the Hadoop build. If you don't define this environment variable,

protoc is looked up in the PATH.

----------------------------------------------------------------------------------

Importing projects to eclipse

When you import the project to eclipse, install hadoop-maven-plugins at first.

  $ cd hadoop-maven-plugins

  $ mvn install

Then, generate eclipse project files.

  $ mvn eclipse:eclipse -DskipTests

At last, import to eclipse by specifying the root directory of the project via

[File] > [Import] > [Existing Projects into Workspace].

----------------------------------------------------------------------------------

Building distributions:

Create binary distribution without native code and without documentation:

  $ mvn package -Pdist -DskipTests -Dtar

Create binary distribution with native code and with documentation:

  $ mvn package -Pdist,native,docs -DskipTests -Dtar

Create source distribution:

  $ mvn package -Psrc -DskipTests

Create source and binary distributions with native code and documentation:

  $ mvn package -Pdist,native,docs,src -DskipTests -Dtar

Create a local staging version of the website (in /tmp/hadoop-site)

  $ mvn clean site -Preleasedocs; mvn site:stage -DstagingDirectory=/tmp/hadoop-site

----------------------------------------------------------------------------------

Installing Hadoop

Look for these HTML files after you build the document by the above commands.

  * Single Node Setup:

    hadoop-project-dist/hadoop-common/SingleCluster.html

  * Cluster Setup:

    hadoop-project-dist/hadoop-common/ClusterSetup.html

----------------------------------------------------------------------------------

Handling out of memory errors in builds

----------------------------------------------------------------------------------

If the build process fails with an out of memory error, you should be able to fix

it by increasing the memory used by maven which can be done via the environment

variable MAVEN_OPTS.

Here is an example setting to allocate between 256 and 512 MB of heap space to

Maven

export MAVEN_OPTS="-Xms256m -Xmx512m"

----------------------------------------------------------------------------------

Building on Windows

----------------------------------------------------------------------------------

Requirements:

* Windows System

* JDK 1.7+

* Maven 3.0 or later

* Findbugs 1.3.9 (if running findbugs)

* ProtocolBuffer 2.5.0

* CMake 2.6 or newer

* Windows SDK 7.1 or Visual Studio 2010 Professional

* Windows SDK 8.1 (if building CPU rate control for the container executor)

* zlib headers (if building native code bindings for zlib)

* Internet connection for first build (to fetch all Maven and Hadoop dependencies)

* Unix command-line tools from GnuWin32: sh, mkdir, rm, cp, tar, gzip. These

  tools must be present on your PATH.

* Python ( for generation of docs using 'mvn site')

Unix command-line tools are also included with the Windows Git package which

can be downloaded from http://git-scm.com/downloads
If using Visual Studio, it must be Visual Studio 2010 Professional (not 2012).

Do not use Visual Studio Express.  It does not support compiling for 64-bit,

which is problematic if running a 64-bit system.  The Windows SDK 7.1 is free to

download here:
http://www.microsoft.com/en-us/download/details.aspx?id=8279
The Windows SDK 8.1 is available to download at:
http://msdn.microsoft.com/en-us/windows/bg162891.aspx
Cygwin is neither required nor supported.

----------------------------------------------------------------------------------

Building:

Keep the source code tree in a short path to avoid running into problems related

to Windows maximum path length limitation (for example, C:\hdc).

Run builds from a Windows SDK Command Prompt. (Start, All Programs,

Microsoft Windows SDK v7.1, Windows SDK 7.1 Command Prompt).

JAVA_HOME must be set, and the path must not contain spaces. If the full path

would contain spaces, then use the Windows short path instead.

You must set the Platform environment variable to either x64 or Win32 depending

on whether you're running a 64-bit or 32-bit system. Note that this is

case-sensitive. It must be "Platform", not "PLATFORM" or "platform".

Environment variables on Windows are usually case-insensitive, but Maven treats

them as case-sensitive. Failure to set this environment variable correctly will

cause msbuild to fail while building the native code in hadoop-common.

set Platform=x64 (when building on a 64-bit system)

set Platform=Win32 (when building on a 32-bit system)

Several tests require that the user must have the Create Symbolic Links

privilege.

All Maven goals are the same as described above with the exception that

native code is built by enabling the 'native-win' Maven profile. -Pnative-win

is enabled by default when building on Windows since the native components

are required (not optional) on Windows.

If native code bindings for zlib are required, then the zlib headers must be

deployed on the build machine. Set the ZLIB_HOME environment variable to the

directory containing the headers.

set ZLIB_HOME=C:\zlib-1.2.7

At runtime, zlib1.dll must be accessible on the PATH. Hadoop has been tested

with zlib 1.2.7, built using Visual Studio 2010 out of contrib\vstudio\vc10 in

the zlib 1.2.7 source tree.
http://www.zlib.net/
----------------------------------------------------------------------------------

Building distributions:

 * Build distribution with native code    : mvn package [-Pdist][-Pdocs][-Psrc][-Dtar]

—————————————————————————————————————————————————————————————————————————————————

EJ:即使按照上文编译hadoop源码成功了,在import进eclipse中还是会看到很多errors,一些简单的在 http://wiki.apache.org/hadoop/EclipseEnvironment 这个wiki

就可以解决。以下是我碰到的一些比较难解决的问题。当你发觉自己eclipse显示56~59 errors 的时候,是时候来这里看看了。

这里一部分问题时关于编译时 mvn install -DSkipTests 的,按照教程,这当中会自动生成一些文件,但当我操作的时候并未出现,卡了我好久,同样被卡的同学可以来看看

EJ Hadoop_src Notes

Solve Error of importing hadoop to eclipse

Error1: org.apache.hadoop.ipc.protobuf cannot be resolved

Solution:
$ cd hadoop-2.5.2-src/hadoop-common-project/hadoop-common/src/test/proto
$ protoc --java_out=../java *.proto

Error2: AvroRecord cannot be resolved to a type TestAv
8ed3
roSerialization.java

Solution:
1. download avro-tools-1.7.7.jar put it in hadoop-2.x-src/

$ cd hadoop-2.5.2-src/hadoop-common-project/hadoop-common/src/test/avro
$ java -jar ~/hadoop-2.5.2-src/avro-tools-1.7.7.jar compile schema avroRecord.avsc ../java

Error2: Project ‘hadoop-streaming’ is missing required source … Build Path Problem

Solution:
right click hadoop-streaming->properties->Java Build Path->Source->remove error items
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  hadoop