Spark On YARN自动调整Executor数量配置 - Dynamic Resource Allocation
2016-04-13 14:31
471 查看
Spark 1.5.2版本支持为Spark On YARN模式的Spark Application根据Task自动调整Executor数,要启用该功能,需做以下操作:
一:在所有的NodeManager中,修改yarn-site.xml,为yarn.nodemanager.aux-services添加spark_shuffle值,设置yarn.nodemanager.aux-services.spark_shuffle.class值为org.apache.spark.network.yarn.YarnShuffleService,如下:
修改:
点击(此处)折叠或打开
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle</value>
</property>
添加:
点击(此处)折叠或打开
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
<property>
<name>spark.shuffle.service.port</name>
<value>7337</value>
</property>
二:将 $SPARK_HOME/lib/spark-1.5.2-yarn-shuffle.jar 文件拷贝到hadoop/lib (hadoop lib)目录下,我的是/usr/lib/hadoop/lib/这个路径下,有些文章还说拷贝到/usr/lib/hadoop-yarn/lib/(yarn lib)或者软连接处理
备注:
这个操作在所有的 nodemanager 中操作
三:配置 $SPARK_HOME/conf/spark-defaults.conf,内容如下
spark.dynamicAllocation.minExecutors 1 #最小Executor数
spark.dynamicAllocation.maxExecutors 100 #最大Executor数
开启自动调节
spark.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
四:执行时开启自动调整Executor数开关,以spark-sql yarn client模式为例:
spark-sql --master yarn-client --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.enabled=true -e "SELECT COUNT(*) FROM xx"
这里的--conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.enabled=true 如果在spark-defaults.conf设置开启就可以省略了
对于使用spark-submit也是一样:
spark-submit \
--class SySpark.SqlOnSpark \
--master yarn-client \
--conf spark.shuffle.service.enabled=true \
--conf spark.dynamicAllocation.enabled=true \
/data/jars/SqlOnSpark.jar \
"SELECT COUNT(*) FROM xx"
参考:
http://blog.chinaunix.net/uid-22570852-id-5182664.html http://lxw1234.com/archives/2015/12/593.htm https://spark.apache.org/docs/1.5.2/job-scheduling.html#default-behavior-of-pools
一:在所有的NodeManager中,修改yarn-site.xml,为yarn.nodemanager.aux-services添加spark_shuffle值,设置yarn.nodemanager.aux-services.spark_shuffle.class值为org.apache.spark.network.yarn.YarnShuffleService,如下:
修改:
点击(此处)折叠或打开
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle</value>
</property>
添加:
点击(此处)折叠或打开
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
<property>
<name>spark.shuffle.service.port</name>
<value>7337</value>
</property>
二:将 $SPARK_HOME/lib/spark-1.5.2-yarn-shuffle.jar 文件拷贝到hadoop/lib (hadoop lib)目录下,我的是/usr/lib/hadoop/lib/这个路径下,有些文章还说拷贝到/usr/lib/hadoop-yarn/lib/(yarn lib)或者软连接处理
备注:
这个操作在所有的 nodemanager 中操作
三:配置 $SPARK_HOME/conf/spark-defaults.conf,内容如下
spark.dynamicAllocation.minExecutors 1 #最小Executor数
spark.dynamicAllocation.maxExecutors 100 #最大Executor数
开启自动调节
spark.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
四:执行时开启自动调整Executor数开关,以spark-sql yarn client模式为例:
spark-sql --master yarn-client --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.enabled=true -e "SELECT COUNT(*) FROM xx"
这里的--conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.enabled=true 如果在spark-defaults.conf设置开启就可以省略了
对于使用spark-submit也是一样:
spark-submit \
--class SySpark.SqlOnSpark \
--master yarn-client \
--conf spark.shuffle.service.enabled=true \
--conf spark.dynamicAllocation.enabled=true \
/data/jars/SqlOnSpark.jar \
"SELECT COUNT(*) FROM xx"
参考:
http://blog.chinaunix.net/uid-22570852-id-5182664.html http://lxw1234.com/archives/2015/12/593.htm https://spark.apache.org/docs/1.5.2/job-scheduling.html#default-behavior-of-pools
相关文章推荐
- ListView嵌套GridView的Demo
- spark on yarn报错ERROR GPLNativeCodeLoader: Could not load native gpl library
- maven编译spark1.6.2
- Hbase启动报错zookeeper.RecoverableZooKeeper: Node /hbase/rs/slave3
- 当我们说数据挖掘的时候我们在说什么
- Namenode停止报错 Error: flush failed for required journal
- azkaban任务报错java.lang.RuntimeException: The root scratch dir: /tmp/hive
- su: /bin/bash: Too many open files in system
- Python XML No module named etree.ElementTree
- Ubuntu下安装numpy matplotlib scikit-learn ipython-notebook
- Ubuntu 14.10安装SecureCRT 7.3
- Hadoop之使用python实现数据集合间join操作
- ubuntu14 安装Oracle java JDK
- ambari增加journalnode服务节点
- ambari迁移HistoryServer服务
- 从集中到分布式
- Oracle主机身份证明中的用户名和口令错误的解决方法
- 持续集成回顾暨点滴分享[5] – 吐槽篇,代码提交记录很重要!
- 关于字符编码以及BOM(字节顺序标记(ByteOrderMark))
- C++文件操作(输入输出、格式控制、文件打开模式、测试流状态、二进制读写)