您的位置:首页 > 其它

Spark源码的编译过程详细解读(各版本)(博主推荐)

2017-05-12 17:28 661 查看
说在前面的话

   重新试多几次。编译过程中会出现下载某个包的时间太久,这是由于连接网站的过程中会出现假死,按ctrl+c,重新运行编译命令。 

如果出现缺少了某个文件的情况,则要先清理maven(使用命令 mvn clean) 再重新编译。 

Spark源码编译的3大方式 

  1、Maven编译

  2、SBT编译  (暂时没)

  3、打包编译make-distribution.sh

前言

   Spark可以通过SBT和Maven两种方式进行编译,再通过make-distribution.sh脚本生成部署包。  

   SBT编译需要安装git工具,而Maven安装则需要maven工具,两种方式均需要在联网 下进行。

    尽管maven是Spark官网推荐的编译方式,但是sbt的编译速度更胜一筹。因此,对于spark的开发者来说,sbt编译可能是更好的选择。由于sbt编译也是基于maven的POM文件,因此sbt的编译参数与maven的编译参数是一致的。

 

心得

   有时间,自己一定要动手编译源码,想要成为高手和大数据领域大牛,前面的苦,是必定要吃的。

   无论是编译spark源码,还是hadoop源码。新手初次编译,一路会碰到很多问题,也许会花上个一天甚至几天,这个是正常。把心态端正就是!有错误,更好,解决错误,是最好锻炼和提升能力的。

       更不要小看它们,能碰到是幸运,能弄懂和深入研究,之所以然,是福气。

各大版本简介

  1、Apache版------可自己编译,也可采用预编译的版本

  2、CDH版---------无需自己编译


    Cloudera Manager安装之利用parcels方式安装3节点集群(包含最新稳定版本或指定版本的安装)(添加服务)

   3、HDP版----------无需自己编译


    Ambari安装部署搭建hdp集群(图文分五大步详解)(博主强烈推荐)

 主流是这3大版本,其实,是有9大版本。CDH的CM是要花钱的,当然它的预编译包,是免费的。

 

hadoop/spark源码的下载方式:

  1、官网下载

       2、Github下载(仅source code)

以下是从官网下载:



 

以下是Github下载(仅source code)



 

CDH的下载

 http://archive-primary.cloudera.com/cdh5/cdh/5/



HDP的下载

http://zh.hortonworks.com/products/

好的,那我这里就以,Githud为例。

         准备Linux系统环境(如CentOS6.5)

********************************************************************************  

*  思路流程:

*      第一大步:在线安装git

*      第二大步:创建一个目录来克隆spark源代码(mkdir -p /root/projects/opensource)

*      第三大步:切换分支

*      第四大步:安装jdk1.7+

*      第五大步:安装maven 

*      第六大步:看官网,跟着走

*      第七大步:通过MVN下载对应的包

 ********************************************************************************  

当然,可以参考官网给出的文档,

 


http://spark.apache.org/docs/1.6.1/building-spark.html

第一大步:在线安装git(root 用户下)

  yum install git       (root用户)
  或者
  Sudo yum install git (普通用户)

 


[root@Compiler ~]# yum install git

.......

Total download size: 4.7 M

Installed size: 15 M

Is this ok [y/N]: y

Downloading Packages:

(1/3): git-1.7.1-4.el6_7.1.x86_64.rpm                                                                                                                                   | 4.6 MB     00:01

.........

Complete!

[root@Compiler ~]#


第二大步:创建一个目录克隆spark源代码

  
mkdir -p /root/projects/opensource

cd /root/projects/opensource

  git clone https://github.com/apache/spark.git[/code] 


[root@Compiler ~]# pwd

/root

[root@Compiler ~]# mkdir -p /root/projects/opensource

[root@Compiler ~]# cd projects/opensource/

[root@Compiler opensource]# pwd

/root/projects/opensource

[root@Compiler opensource]# ls

[root@Compiler opensource]#


[root@Compiler ~]# pwd

/root
[root@Compiler ~]# mkdir -p /root/projects/opensource
[root@Compiler ~]# cd projects/opensource/
[root@Compiler opensource]# pwd
/root/projects/opensource
[root@Compiler opensource]# ls
[root@Compiler opensource]#
 

 


[root@Compiler opensource]# pwd

/root/projects/opensource

[root@Compiler opensource]# git clone https://github.com/apache/spark.git 
Initialized empty Git repository in /root/projects/opensource/spark/.git/

remote: Counting objects: 403059, done.

remote: Compressing objects: 100% (13/13), done.

remote: Total 403059 (delta 4), reused 1 (delta 1), pack-reused 403045

Receiving objects: 100% (403059/403059), 182.79 MiB | 896 KiB/s, done.

Resolving deltas: 100% (157557/157557), done.

[root@Compiler opensource]# ls

spark

[root@Compiler opensource]# cd spark/

[root@Compiler spark]#


[root@Compiler opensource]# pwd

/root/projects/opensource

[root@Compiler opensource]# git clone https://github.com/apache/spark.git
Initialized empty Git repository in /root/projects/opensource/spark/.git/

remote: Counting objects: 403059, done.

remote: Compressing objects: 100% (13/13), done.

remote: Total 403059 (delta 4), reused 1 (delta 1), pack-reused 403045

Receiving objects: 100% (403059/403059), 182.79 MiB | 896 KiB/s, done.

Resolving deltas: 100% (157557/157557), done.

[root@Compiler opensource]# ls

spark

[root@Compiler opensource]# cd spark/

[root@Compiler spark]#

 

 

 

其实就是,对应着,如下网页界面。









[root@Compiler spark]# pwd

/root/projects/opensource/spark

[root@Compiler spark]# ll

total 280

-rw-r--r--.  1 root root  1804 Sep  2 03:53 appveyor.yml

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 assembly

drwxr-xr-x.  2 root root  4096 Sep  2 03:53 bin

drwxr-xr-x.  2 root root  4096 Sep  2 03:53 build

drwxr-xr-x.  8 root root  4096 Sep  2 03:53 common

drwxr-xr-x.  2 root root  4096 Sep  2 03:53 conf

-rw-r--r--.  1 root root   988 Sep  2 03:53 CONTRIBUTING.md

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 core

drwxr-xr-x.  5 root root  4096 Sep  2 03:53 data

drwxr-xr-x.  6 root root  4096 Sep  2 03:53 dev

drwxr-xr-x.  9 root root  4096 Sep  2 03:53 docs

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 examples

drwxr-xr-x. 15 root root  4096 Sep  2 03:53 external

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 graphx

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 launcher

-rw-r--r--.  1 root root 17811 Sep  2 03:53 LICENSE

drwxr-xr-x.  2 root root  4096 Sep  2 03:53 licenses

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 mesos

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 mllib

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 mllib-local

-rw-r--r--.  1 root root 24749 Sep  2 03:53 NOTICE

-rw-r--r--.  1 root root 97324 Sep  2 03:53 pom.xml

drwxr-xr-x.  2 root root  4096 Sep  2 03:53 project

drwxr-xr-x.  6 root root  4096 Sep  2 03:53 python

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 R

-rw-r--r--.  1 root root  3828 Sep  2 03:53 README.md

drwxr-xr-x.  5 root root  4096 Sep  2 03:53 repl

drwxr-xr-x.  2 root root  4096 Sep  2 03:53 sbin

-rw-r--r--.  1 root root 16952 Sep  2 03:53 scalastyle-config.xml

drwxr-xr-x.  6 root root  4096 Sep  2 03:53 sql

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 streaming

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 tools

drwxr-xr-x.  3 root root  4096 Sep  2 03:53 yarn

[root@Compiler spark]#


第三大步:切换分支

git checkout v1.6.1 //在spark目录下执行


 




[root@Compiler spark]# pwd

/root/projects/opensource/spark

[root@Compiler spark]# git branch -a

* master

remotes/origin/HEAD -> origin/master

remotes/origin/branch-0.5

...

remotes/origin/branch-1.6

remotes/origin/branch-2.0

remotes/origin/master

[root@Compiler spark]# git checkout v1.6.1

Note: checking out 'v1.6.1'.

You are in 'detached HEAD' state. You can look around, make experimental

...

HEAD is now at 15de51c... Preparing Spark release v1.6.1-rc1

[root@Compiler spark]#


那么,就有了。make-distribution.sh



[root@Compiler spark]# pwd

/root/projects/opensource/spark

[root@Compiler spark]# ll

total 1636

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 assembly

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 bagel

drwxr-xr-x.  2 root root    4096 Sep  2 03:57 bin

drwxr-xr-x.  2 root root    4096 Sep  2 03:57 build

-rw-r--r--.  1 root root 1343562 Sep  2 03:57 CHANGES.txt

drwxr-xr-x.  2 root root    4096 Sep  2 03:57 conf

-rw-r--r--.  1 root root     988 Sep  2 03:53 CONTRIBUTING.md

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 core

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 data

drwxr-xr-x.  7 root root    4096 Sep  2 03:57 dev

drwxr-xr-x.  4 root root    4096 Sep  2 03:57 docker

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 docker-integration-tests

drwxr-xr-x.  9 root root    4096 Sep  2 03:57 docs

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 ec2

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 examples

drwxr-xr-x. 11 root root    4096 Sep  2 03:57 external

drwxr-xr-x.  6 root root    4096 Sep  2 03:57 extras

drwxr-xr-x.  4 root root    4096 Sep  2 03:57 graphx

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 launcher

-rw-r--r--.  1 root root   17352 Sep  2 03:57 LICENSE

drwxr-xr-x.  2 root root    4096 Sep  2 03:57 licenses

-rwxr-xr-x.  1 root root    8557 Sep  2 03:57 make-distribution.sh

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 mllib

drwxr-xr-x.  5 root root    4096 Sep  2 03:57 network

-rw-r--r--.  1 root root   23529 Sep  2 03:57 NOTICE

-rw-r--r--.  1 root root   91106 Sep  2 03:57 pom.xml

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 project

-rw-r--r--.  1 root root   13991 Sep  2 03:57 pylintrc

drwxr-xr-x.  6 root root    4096 Sep  2 03:57 python

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 R

-rw-r--r--.  1 root root    3359 Sep  2 03:57 README.md

drwxr-xr-x.  5 root root    4096 Sep  2 03:57 repl

drwxr-xr-x.  2 root root    4096 Sep  2 03:57 sbin

drwxr-xr-x.  2 root root    4096 Sep  2 03:57 sbt

-rw-r--r--.  1 root root   13191 Sep  2 03:57 scalastyle-config.xml

drwxr-xr-x.  6 root root    4096 Sep  2 03:57 sql

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 streaming

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 tags

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 tools

-rw-r--r--.  1 root root     848 Sep  2 03:57 tox.ini

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 unsafe

drwxr-xr-x.  3 root root    4096 Sep  2 03:57 yarn

[root@Compiler spark]#


其实啊,对应下面的这个界面



 修改make-distribution.sh文件





[root@Compiler spark]# pwd

/root/projects/opensource/spark

[root@Compiler spark]# vim make-distribution.sh


 


我自己安装的maven,是 MAVEN_HOME=/usr/local/apache-maven-3.3.3



改为。

MVN="/usr/local/apache-maven-3.3.3/bin/mvn"  或

MVN="$MAVEN_HOME/bin /mvn"




MAKE_TGZ=false

NAME=none

#MVN="$SPARK_HOME/build/mvn"

MVN="$MAVEN_HOME/bin/mvn"


第四大步   安装jdk7+

 第一步:查看Centos6.5自带的JDK是否已安装 
<1> 检测原OPENJDK版本  
# java -version    



一般将获得如下信息:      

tzdata-java-2013g-1.el6.noarch

java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64

java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64


 <2>进一步查看JDK信息
# rpm -qa|grep java    



rpm -e --nodeps tzdata-java-2013g-1.el6.noarch

rpm -e --nodeps java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64

rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64


<3>卸载OPENJDK



自带的jdk已经没了。
 
在root用户下安装jdk-7u79-linux-x64.tar.gz
在/usr/local上传





解压,tar -zxvf jdk-7u79-linux-x64.tar.gz



删除压缩包,rm -rf jdk-7u79-linux-x64.tar.gz
配置环境变量,vim /etc/profile



#java

export JAVA_HOME=/usr/local/jdk1.7.0_79

export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/jre/lib/dt.jar:$JAVA_HOME/jre/lib/tools.jar

export PATH=$PATH:$JAVA_HOME/bin




文件生效,source /etc/profile



查看是否安装成功,java –version



 

 
第五大步、安装maven
下载apache-maven-3.3.3-bin.tar.gz
/usr/local/
上传apache-maven-3.3.3-bin.tar.gz



解压,tar -zxvf apache-maven-3.3.3-bin.tar.gz



删除压缩包,rm -rf apache-maven-3.3.3-bin.tar.gz
maven的配置环境变量,vim /etc/profile

#maven

export MAVEN_HOME=/usr/local/apache-maven-3.3.3

export PATH=$PATH:$MAVEN_HOME/bin






文件生效,source /etc/profile



查看是否安装成功,mvn -v
 
第六大步:看官网,跟着走,初步了解
http://spark.apache.org/docs/1.6.1/building-spark.html







[root@Compiler spark]# vim pom.xml
先来初步认识下这个pom.xml文件





P是profile的意思,
 我们可以同时激活多个嘛
其他的不再赘述,这是对它的一些初步认识。
有了对pom.xml的初步了解,之后呢?经验之谈,一般都会对$MAVEN_HOME/conf/settings.xml修改,这是大牛在生产环境下的心血总结啊!!!
这里啊,给大家推荐一款很实用的软件!



解压,



 












这是不行的



是因为,左侧 本地站点 这个位置选的是 计算机 ,而非具体的某个盘。



 
 


以下是默认的



<?xml version="1.0" encoding="UTF-8"?>

<!--

Licensed to the Apache Software Foundation (ASF) under one

or more contributor license agreements.  See the NOTICE file

distributed with this work for additional information

regarding copyright ownership.  The ASF licenses this file

to you under the Apache License, Version 2.0 (the

"License"); you may not use this file except in compliance

with the License.  You may obtain a copy of the License at
 http://www.apache.org/licenses/LICENSE-2.0 
Unless required by applicable law or agreed to in writing,

software distributed under the License is distributed on an

"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY

KIND, either express or implied.  See the License for the

specific language governing permissions and limitations

under the License.

-->

<!--

| This is the configuration file for Maven. It can be specified at two levels:

|

|  1. User Level. This settings.xml file provides configuration for a single user,

|                 and is normally provided in ${user.home}/.m2/settings.xml.

|

|                 NOTE: This location can be overridden with the CLI option:

|

|                 -s /path/to/user/settings.xml

|

|  2. Global Level. This settings.xml file provides configuration for all Maven

|                 users on a machine (assuming they're all using the same Maven

|                 installation). It's normally provided in

|                 ${maven.home}/conf/settings.xml.

|

|                 NOTE: This location can be overridden with the CLI option:

|

|                 -gs /path/to/global/settings.xml

|

| The sections in this sample file are intended to give you a running start at

| getting the most out of your Maven installation. Where appropriate, the default

| values (values used when the setting is not specified) are provided.

|

|-->

<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd"> 
<!-- localRepository

| The path to the local repository maven will use to store artifacts.

|

| Default: ${user.home}/.m2/repository

<localRepository>/path/to/local/repo</localRepository>

-->

<!-- interactiveMode

| This will determine whether maven prompts you when it needs input. If set to false,

| maven will use a sensible default value, perhaps based on some other setting, for

| the parameter in question.

|

| Default: true

<interactiveMode>true</interactiveMode>

-->

<!-- offline

| Determines whether maven should attempt to connect to the network when executing a build.

| This will have an effect on artifact downloads, artifact deployment, and others.

|

| Default: false

<offline>false</offline>

-->

<!-- pluginGroups

| This is a list of additional group identifiers that will be searched when resolving plugins by their prefix, i.e.

| when invoking a command line like "mvn prefix:goal". Maven will automatically add the group identifiers

| "org.apache.maven.plugins" and "org.codehaus.mojo" if these are not already contained in the list.

|-->

<pluginGroups>

<!-- pluginGroup

| Specifies a further group identifier to use for plugin lookup.

<pluginGroup>com.your.plugins</pluginGroup>

-->

</pluginGroups>

<!-- proxies

| This is a list of proxies which can be used on this machine to connect to the network.

| Unless otherwise specified (by system property or command-line switch), the first proxy

| specification in this list marked as active will be used.

|-->

<proxies>

<!-- proxy

| Specification for one proxy, to be used in connecting to the network.

|

<proxy>

<id>optional</id>

<active>true</active>

<protocol>http</protocol>

<username>proxyuser</username>

<password>proxypass</password>

<host>proxy.host.net</host>

<port>80</port>

<nonProxyHosts>local.net|some.host.com</nonProxyHosts>

</proxy>

-->

</proxies>

<!-- servers

| This is a list of authentication profiles, keyed by the server-id used within the system.

| Authentication profiles can be used whenever maven must make a connection to a remote server.

|-->

<servers>

<!-- server

| Specifies the authentication information to use when connecting to a particular server, identified by

| a unique name within the system (referred to by the 'id' attribute below).

|

| NOTE: You should either specify username/password OR privateKey/passphrase, since these pairings are

|       used together.

|

<server>

<id>deploymentRepo</id>

<username>repouser</username>

<password>repopwd</password>

</server>

-->

<!-- Another sample, using keys to authenticate.

<server>

<id>siteServer</id>

<privateKey>/path/to/private/key</privateKey>

<passphrase>optional; leave empty if not used.</passphrase>

</server>

-->

</servers>

<!-- mirrors

| This is a list of mirrors to be used in downloading artifacts from remote repositories.

|

| It works like this: a POM may declare a repository to use in resolving certain artifacts.

| However, this repository may have problems with heavy traffic at times, so people have mirrored

| it to several places.

|

| That repository definition will have a unique id, so we can create a mirror reference for that

| repository, to be used as an alternate download site. The mirror site will be the preferred

| server for that repository.

|-->

<mirrors>

<!-- mirror

| Specifies a repository mirror site to use instead of a given repository. The repository that

| this mirror serves has an ID that matches the mirrorOf element of this mirror. IDs are used

| for inheritance and direct lookup purposes, and must be unique across the set of mirrors.

|

<mirror>

<id>mirrorId</id>

<mirrorOf>repositoryId</mirrorOf>

<name>Human Readable Name for this Mirror.</name>

<url>http://my.repository.com/repo/path</url>

</mirror>

-->

</mirrors>

<!-- profiles

| This is a list of profiles which can be activated in a variety of ways, and which can modify

| the build process. Profiles provided in the settings.xml are intended to provide local machine-

| specific paths and repository locations which allow the build to work in the local environment.

|

| For example, if you have an integration testing plugin - like cactus - that needs to know where

| your Tomcat instance is installed, you can provide a variable here such that the variable is

| dereferenced during the build process to configure the cactus plugin.

|

| As noted above, profiles can be activated in a variety of ways. One way - the activeProfiles

| section of this document (settings.xml) - will be discussed later. Another way essentially

| relies on the detection of a system property, either matching a particular value for the property,

| or merely testing its existence. Profiles can also be activated by JDK version prefix, where a

| value of '1.4' might activate a profile when the build is executed on a JDK version of '1.4.2_07'.

| Finally, the list of active profiles can be specified directly from the command line.

|

| NOTE: For profiles defined in the settings.xml, you are restricted to specifying only artifact

|       repositories, plugin repositories, and free-form properties to be used as configuration

|       variables for plugins in the POM.

|

|-->

<profiles>

<!-- profile

| Specifies a set of introductions to the build process, to be activated using one or more of the

| mechanisms described above. For inheritance purposes, and to activate profiles via <activatedProfiles/>

| or the command line, profiles have to have an ID that is unique.

|

| An encouraged best practice for profile identification is to use a consistent naming convention

| for profiles, such as 'env-dev', 'env-test', 'env-production', 'user-jdcasey', 'user-brett', etc.

| This will make it more intuitive to understand what the set of introduced profiles is attempting

| to accomplish, particularly when you only have a list of profile id's for debug.

|

| This profile example uses the JDK version to trigger activation, and provides a JDK-specific repo.

<profile>

<id>jdk-1.4</id>

<activation>

<jdk>1.4</jdk>

</activation>

<repositories>

<repository>

<id>jdk14</id>

<name>Repository for JDK 1.4 builds</name>

<url>http://www.myhost.com/maven/jdk14</url>

<layout>default</layout>

<snapshotPolicy>always</snapshotPolicy>

</repository>

</repositories>

</profile>

-->

<!--

| Here is another profile, activated by the system property 'target-env' with a value of 'dev',

| which provides a specific path to the Tomcat instance. To use this, your plugin configuration

| might hypothetically look like:

|

| ...

| <plugin>

|   <groupId>org.myco.myplugins</groupId>

|   <artifactId>myplugin</artifactId>

|

|   <configuration>

|     <tomcatLocation>${tomcatPath}</tomcatLocation>

|   </configuration>

| </plugin>

| ...

|

| NOTE: If you just wanted to inject this configuration whenever someone set 'target-env' to

|       anything, you could just leave off the <value/> inside the activation-property.

|

<profile>

<id>env-dev</id>

<activation>

<property>

<name>target-env</name>

<value>dev</value>

</property>

</activation>

<properties>

<tomcatPath>/path/to/tomcat/instance</tomcatPath>

</properties>

</profile>

-->

</profiles>

<!-- activeProfiles

| List of profiles that are active for all builds.

|

<activeProfiles>

<activeProfile>alwaysActiveProfile</activeProfile>

<activeProfile>anotherAlwaysActiveProfile</activeProfile>

</activeProfiles>

-->

</settings>


 
改为,



 

<?xml version="1.0" encoding="UTF-8"?>

<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd"> 
<pluginGroups>

</pluginGroups>

<proxies>

</proxies>

<servers>

</servers>

<mirrors>

<mirror>

<id>nexus-osc</id>

<mirrorOf>*</mirrorOf>

<name>Nexus osc</name>

<url>http://nexus.rc.dataengine.com/nexus/content/groups/public</url>

</mirror>

<mirror>

<id>nexus-osc</id>

<mirrorOf>central</mirrorOf>

<name>Nexus osc</name>

<url>http://maven.oschina.net/content/groups/public</url>

</mirror>

<mirror>

<id>nexus-osc-thirdparty</id>

<mirrorOf>thirdparty</mirrorOf>

<name>Nexus osc thirdparty</name>

<url>http://maven.oschina.net/content/repositories/thirdparty</url>

</mirror>

<mirror>

<id>central</id>

<mirrorOf>central</mirrorOf>

<name>central</name>

<url>http://central.maven.org/maven2</url>

</mirror>

<mirror>

<id>repol</id>

<mirrorOf>central</mirrorOf>

<name>repol</name>

<url>http://repol.maven.org/maven2</url>

</mirror>

</mirrors>

<profiles>

<profile>

<id>jdk-1.4</id>

<activation>

<jdk>1.4</jdk>

</activation>

<repositories>

<repository>

<id>rc</id>

<name>rc nexus</name>

<url>http://nexus.rc.dataengine.com/nexus/content/groups/public</url>

<releases>

<enabled>true</enabled>

</releases>

<snapshots>

<enabled>false</enabled>

</snapshots>

</repository>

<repository>

<id>nexus</id>

<name>local private nexus</name>

<url>http://maven.oschina.net/content/groups/public</url>

<releases>

<enabled>true</enabled>

</releases>

<snapshots>

<enabled>false</enabled>

</snapshots>

</repository>

<repository>

<id>central</id>

<name>central</name>

<url>http://central.maven.org/maven2/</url>

<releases>

<enabled>true</enabled>

</releases>

<snapshots>

<enabled>false</enabled>

</snapshots>

</repository>

<repository>

<id>repol</id>

<name>repol</name>

<url>http://repol.maven.org/maven2/</url>

<releases>

<enabled>true</enabled>

</releases>

<snapshots>

<enabled>false</enabled>

</snapshots>

</repository>

</repositories>

<pluginRepositories>

<pluginRepository>

<id>rc</id>

<name>rc nexus</name>

<url>http://nexus.rc.dataengine.com/nexus/content/groups/public</url>

<releases>

<enabled>true</enabled>

</releases>

<snapshots>

<enabled>false</enabled>

</snapshots>

</pluginRepository>

<pluginRepository>

<id>nexus</id>

<name>local private nexus</name>

<url>http://maven.oschina.net/content/groups/public</url>

<releases>

<enabled>true</enabled>

</releases>

<snapshots>

<enabled>false</enabled>

</snapshots>

</pluginRepository>

<pluginRepository>

<id>central</id>

<name>central</name>

<url>http://central.maven.org/maven2/</url>

<releases>

<enabled>true</enabled>

</releases>

<snapshots>

<enabled>false</enabled>

</snapshots>

</pluginRepository>

<pluginRepository>

<id>repol</id>

<name>repol</name>

<url>http://repol.maven.org/maven2/</url>

<releases>

<enabled>true</enabled>

</releases>

<snapshots>

<enabled>false</enabled>

</snapshots>

</pluginRepository>

</pluginRepositories>

</profile>

</profiles>

<activateProfiles>

<activateProfile>jdk-1.4</activateProfile>

</activateProfiles>

</settings>


 
 








 
 






好啦,上述,是初步的解读!!!
 
 
我们继续,解读spark根目录,





 
 


 
 
这样,我们就对这个目录结构,有了一个里里外外的认识。
https://github.com/apache/spark/tree/v1.6.1



好吧,到此,我对https://github.com/apache/spark/tree/v1.6.1 的解读到此结束。其他的,以后多深入研究。
 
 
第七大步:先通过mvn下载相应的jar包
 mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.1 -Phive -Phive-thriftserver -Psparkr -DskipTests
clean package    //在spark 源码父目录下执行



[root@Compiler spark]# pwd

/root/projects/opensource/spark

[root@Compiler spark]# mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.1 -Phive -Phive-thriftserver -Psparkr -DskipTests clean package

 也许,要 

[root@Compiler spark]# mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.1 -Phive -Phive-thriftserver -Psparkr -DskipTests clean package

 

 第八大步: 编译spark

./make-distribution.sh --name custom-spark --tgz -Psparkr -Phadoop-2.6 -Dhadoop.version=2.7.1 -Phive -Phive-thriftserver -Pyarn     //在spark 源码父目录下执行
[root@Compiler spark]# ./make-distribution.sh --name custom-spark --tgz -Psparkr -Phadoop-2.6 -Dhadoop.version=2.7.1 -Phive -Phive-thriftserver -Pyarn
安装过程中,可能会出现缺少包的情况、这可以通过重复七八步解决,实在不行,可以根据错误信息的提示,前往maven的官方网站,下载对应的包进行安装,我在安装的时候就通过maven安装了以下文件:
mvn install:install-file -DgroupId=org.scalatest -DartifactId=scalatest-maven-plugin -Dversion=1.0 -Dpackaging=jar -Dfile=/home/neoway/scalatest-maven-plugin-1.0.jar

Reference:http://www.cnblogs.com/zlslch/p/5865707.html

官方文档地址:http://spark.apache.org/docs/latest/building-spark.html
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: