您的位置:首页 > 运维架构

maven编译spark1.2 on hadoop-2.6.0

2015-03-09 08:58 736 查看
1、安装maven

(1)设置MAVEN_HOME

(2)将$MAVEN_HOME/bin参加PATH变量。

(3)设置maven_opts内存参数

export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"

若不运行,编译时必定出现如下错误,因为spark编译需要很大的内存

[INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-2.10/classes...

[ERROR] PermGen space -> [Help 1]

[INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-2.10/classes...

[ERROR] Java heap space -> [Help 1]

2、编译spark

(1)下载spark
http://spark.apache.org/downloads.html
(2)解压下载的文件

(3)进入根目录

修改源码:mllib\src\main\scala\org\apache\spark\mllib\optimization\Gradient.scala

[ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project spark-mllib_2.10: Failed during scalastyle execution : You have 1 Scalastyle violation(s). -> [Help 1]

[ERROR]

[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.

[ERROR] Re-run Maven using the -X switch to enable full debug logging.

[ERROR]

[ERROR] For more information about the errors and possible solutions, please read the following articles:

[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]

[ERROR] After correcting the problems, you can resume the build with the command

[ERROR] mvn <goals> -rf :spark-mllib_2.10

将带Our loss function的两行删除掉,否则在编译的时候报错

(4)在根目录下执行如下命令编译

mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package
当yarn与hadoop版本不一致时分别指定版本号
[code]mvn -Pyarn-alpha -Phadoop-2.6 -Dhadoop.version=2.6.0 -Dyarn.version=2.6.0 -DskipTests clean package
编译时间较长要耐心等待
(5) 可以跳过(4)使用./make-distribution.sh --name hadoop2.6 --tgz -Pyarn -Phive -Phive-thriftserver -Phadoop-2.6 -Dhadoop.version=2.6.0  -DskipTests
编译加打包

[/code]
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: