【Spark系列6】spark submit提交任务
2017-08-29 19:05
483 查看
根据spark官网,在提交任务的时候指定–jars,用逗号分开。这样做的缺点是每次都要指定jar包,如果jar包少的话可以这么做,但是如果多的话会很麻烦。
关于master的值
(1)对于standalone模式,是spark://ip:port/的形式
(2)对于yarn,有yarn-cluster与yarn-cluster2种
(3)对于mesos,目前只有client选项
(4)除此之外,还有local
这种用于本地调试的选项
关于client与cluster模式
A common deployment strategy is to submit your application from
a gateway machine that is physically co-located with your worker machines (e.g. Master node in a standalone EC2 cluster). In this setup,
is appropriate. In
the driver is launched directly within the
which acts as a client to
the cluster. The input and output of the application is attached to the console. Thus, this mode is especially suitable for applications that involve the REPL (e.g. Spark shell).
Alternatively, if your application is submitted from a machine far from the worker machines (e.g. locally on your laptop), it is common to use
to minimize network latency between the drivers and the executors. Note that
Currently only YARN supports cluster mode for Python applications.
http://www.cnblogs.com/lujinhong2/p/4666748.html
spark-submit --master yarn-client --executor-memory 3g --executor-cores 2 --num-executors 2 --jars ***.jar,***.jar(你的jar包,用逗号分隔) mysparksubmit.jar
关于master的值
(1)对于standalone模式,是spark://ip:port/的形式
(2)对于yarn,有yarn-cluster与yarn-cluster2种
(3)对于mesos,目前只有client选项
(4)除此之外,还有local
这种用于本地调试的选项
Master URL | Meaning |
---|---|
local | Run Spark locally with one worker thread (i.e. no parallelism at all). |
local[K] | Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine). |
local[*] | Run Spark locally with as many worker threads as logical cores on your machine. |
spark://HOST:PORT | Connect to the given Spark standalone cluster master. The port must be whichever one your master is configured to use, which is 7077 by default. |
mesos://HOST:PORT | Connect to the given Mesos cluster. The port must be whichever one your is configured to use, which is 5050 by default. Or, for a Mesos cluster using ZooKeeper, use mesos://zk://.... |
yarn-client | Connect to a YARN cluster in client mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR variable. |
yarn-cluster | Connect to a YARN cluster in cluster mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR variable. |
A common deployment strategy is to submit your application from
a gateway machine that is physically co-located with your worker machines (e.g. Master node in a standalone EC2 cluster). In this setup,
clientmode
is appropriate. In
clientmode,
the driver is launched directly within the
spark-submitprocess
which acts as a client to
the cluster. The input and output of the application is attached to the console. Thus, this mode is especially suitable for applications that involve the REPL (e.g. Spark shell).
Alternatively, if your application is submitted from a machine far from the worker machines (e.g. locally on your laptop), it is common to use
clustermode
to minimize network latency between the drivers and the executors. Note that
clustermode is currently not supported for Mesos clusters.
Currently only YARN supports cluster mode for Python applications.
http://www.cnblogs.com/lujinhong2/p/4666748.html
相关文章推荐
- Spark源码解析之任务提交(spark-submit)篇
- 蜗龙徒行-Spark学习笔记【四】Spark集群中使用spark-submit提交jar任务包实战经验
- spark调度系列------3. RDD依赖的建立以及RDD依赖在任务提交到调度系统的作用
- spark-submit 提交任务
- spark-submit提交任务到集群-案例
- spark-submit提交任务到集群
- spark-submit提交任务时报错,Error initializing SparkContext
- IDEA Spark-submit提交任务到集群
- spark-submit提交任务到集群
- Spark源码系列(一)spark-submit提交作业过程
- Spark源码系列(一)spark-submit提交作业过程
- Spark-submit提交任务到集群
- 关于spark-submit 使用yarn-client客户端提交spark任务的问题
- Spark源码系列(一)spark-submit提交作业过程
- spark submit参数及调优,任务提交脚本
- spark-submit提交任务的方式
- Spark集群中使用spark-submit提交jar任务包实战经验
- spark下使用submit提交任务后报jar包已存在错误
- spark下使用submit提交任务后报jar包已存在错误
- Spark 源码阅读(5)——Spark-submit任务提交流程