spark on yarn
2015-09-22 20:58
351 查看
说明
这篇文章记录下 spark提交左右在yarn上运行hadoop配置
主要配置yarn-site.xml文件,我们目前使用mapreduce_shuffle,而有些公司也增加了spark_shuffle只使用mapreduce_shuffle
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.nodemanager.aux-services.spark_shuffle.class</name> <value>org.apache.spark.network.yarn.YarnShuffleService</value> </property>
使用mapreduce_shuffle & spark_shuffle
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle,spark_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.nodemanager.aux-services.spark_shuffle.class</name> <value>org.apache.spark.network.yarn.YarnShuffleService</value> </property>
当提交hadoop MR 就启用,mapreduce_shuffle,当提交spark作业 就使用spark_shuffle,但个人感觉spark_shuffle 效率一般,shuffle是很大瓶颈,还有 如果你使用spark_shuffle 你需要把spark-yarn_2.10-1.4.1.jar 这个jar copy 到HADOOP_HOME/share/hadoop/lib下 ,否则 hadoop 运行报错 class not find exeception
spark配置
$SPARK_HOME/conf/spark-env.shexport YARN_CONF_DIR=/home/cluster/apps/hadoop/etc/hadoop export JAVA_HOME=/home/cluster/share/java1.7 export SCALA_HOME=/home/cluster/share/scala-2.10.5 export HADOOP_HOME=/home/cluster/apps/hadoop export HADOOP_CONF_DIR=/home/cluster/apps/hadoop/etc/hadoop export SPARK_MASTER_IP=master export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/home/cluster/apps/hadoop/lib/native export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/cluster/apps/hadoop/share/hadoop/yarn/*:/home/cluster/apps/hadoop/share/hadoop/yarn/lib/*:/home/cluster/apps/hadoop/share/hadoop/common/*:/home/cluster/apps/hadoop/share/hadoop/common/lib/*:/home/cluster/apps/hadoop/share/hadoop/hdfs/*:/home/cluster/apps/hadoop/share/hadoop/hdfs/lib/*:/home/cluster/apps/hadoop/share/hadoop/mapreduce/*:/home/cluster/apps/hadoop/share/hadoop/mapreduce/lib/*:/home/cluster/apps/hadoop/share/hadoop/tools/lib/*:/home/cluster/apps/spark/spark-1.4.1/lib/* SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://master:8020/var/log/spark"
参数解释:
YARN_CONF_DIR:指定yarn配置所在路径,如果不增加这行,在提交作业时候增加如下代码:
export YARN_CONF_DIR=/home/cluster/apps/hadoop/etc/hadoop
HADOOP_HOME:指定hadoop 根目录
HADOOP_CONF_DIR:hadoop配置文件,这个是在spark,如操作hdfs时候读取hadoop配置文件
SPARK_LIBRARY_PATH:告诉spark读取本地的.so文件
SPARK_CLASSPATH:spark加载各种需要的jar包
SPARK_HISTORY_OPTS:配置启动spark history 服务
前置条件
如果操作hdfs,需要启动namenode&datanode还有yarn服务器,resourcemanger&nodemanager
/home/cluster/apps$ jps 29368 MainGenericRunner 29510 Jps 22885 Main 29210 NodeManager 28952 NameNode 29158 ResourceManager 29023 DataNode
提交作业
PI:yarn-cluster模式:
/home/cluster/apps/spark/spark-1.4.1/bin/spark-submit --master yarn-cluster --executor-memory 3g --driver-memory 1g --class org.apache.spark.examples.SparkPi /home/cluster/apps/spark/spark-1.4.1/examples/target/scala-2.10/spark-examples-1.4.1-hadoop2.3.0-cdh5.1.0.jar 10
yarn-client模式:
/home/cluster/apps/spark/spark-1.4.1/bin/spark-submit --master yarn-client --executor-memory 3g --driver-memory 1g --class org.apache.spark.examples.SparkPi /home/cluster/apps/spark/spark-1.4.1/examples/target/scala-2.10/spark-examples-1.4.1-hadoop2.3.0-cdh5.1.0.jar 10
wordcount:
yarn-cluster模式:
/home/cluster/apps/spark/spark-1.4.1/bin/spark-submit --master yarn-cluster --executor-memory 3g --driver-memory 1g --class org.apache.spark.examples.JavaWordCount /home/cluster/apps/spark/spark-1.4.1/examples/target/scala-2.10/spark-examples-1.4.1-hadoop2.3.0-cdh5.1.0.jar /data/hadoop/wordcount/
yarn-client模式:
/home/cluster/apps/spark/spark-1.4.1/bin/spark-submit --master yarn-client --executor-memory 3g --driver-memory 1g --class org.apache.spark.examples.JavaWordCount /home/cluster/apps/spark/spark-1.4.1/examples/target/scala-2.10/spark-examples-1.4.1-hadoop2.3.0-cdh5.1.0.jar /data/hadoop/wordcount/
结果截图
这四条记录从下往上看,分别是PI:yarn-cluster模式,PI:yarn-client模式,wordcount:yarn-cluster模式,wordcount:yarn-client模式
尊重原创,拒绝转载
http://blog.csdn.net/stark_summer/article/details/48661317
相关文章推荐
- (4.4.1.8) android垃圾回收机制及程序优化System.gc
- LeetCode(38) Delete Node in a Linked List
- CVBS信号解析
- 数据结构与算法分析第二章12题
- 随想
- 附加题1—— 我想搞懂的软工问题
- IO流-文件管理
- ListBox控件绑定数据源
- Unix环境编程学习笔记-------通读APUE第一遍后的疑问???????
- 链表实现字典
- Android应用环境搭建
- 链表操作
- 【Machine Learning in Action --2】K-近邻算法改进约会网站的配对效果
- 还是三个数大小
- 无锁编程(四) - CAS与ABA问题
- 第一章作业
- OpenGL缓冲区对象
- 有8个一样的球,其中有一个要重一些。有一个天平,怎么称,才能2次就把重的球找出来?
- spark on yarn
- 朴素贝叶斯分类器