您的位置：首页 > 其它

【Spark系列6】spark submit提交任务

2017-08-29 19:05 483 查看

根据spark官网，在提交任务的时候指定–jars，用逗号分开。这样做的缺点是每次都要指定jar包，如果jar包少的话可以这么做，但是如果多的话会很麻烦。

spark-submit --master yarn-client --executor-memory 3g --executor-cores 2 --num-executors 2 --jars ***.jar,***.jar(你的jar包，用逗号分隔) mysparksubmit.jar

关于master的值

（1）对于standalone模式，是spark://ip:port/的形式

（2）对于yarn，有yarn-cluster与yarn-cluster2种

（3）对于mesos，目前只有client选项

（4）除此之外，还有local
这种用于本地调试的选项

Master URL	Meaning
local	Run Spark locally with one worker thread (i.e. no parallelism at all).
local[K]	Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine).
local[*]	Run Spark locally with as many worker threads as logical cores on your machine.
spark://HOST:PORT	Connect to the given Spark standalone cluster master. The port must be whichever one your master is configured to use, which is 7077 by default.
mesos://HOST:PORT	Connect to the given Mesos cluster. The port must be whichever one your is configured to use, which is 5050 by default. Or, for a Mesos cluster using ZooKeeper, use mesos://zk://... .
yarn-client	Connect to a YARN cluster in client mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR variable.
yarn-cluster	Connect to a YARN cluster in cluster mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR variable.

关于client与cluster模式

A common deployment strategy is to submit your application from
a gateway machine that is physically co-located with your worker machines (e.g. Master node in a standalone EC2 cluster). In this setup,

client

mode
is appropriate. In

client

mode,
the driver is launched directly within the

spark-submit

process
which acts as a client to
the cluster. The input and output of the application is attached to the console. Thus, this mode is especially suitable for applications that involve the REPL (e.g. Spark shell).

Alternatively, if your application is submitted from a machine far from the worker machines (e.g. locally on your laptop), it is common to use

cluster

mode
to minimize network latency between the drivers and the executors. Note that

cluster

mode is currently not supported for Mesos clusters.
Currently only YARN supports cluster mode for Python applications.
http://www.cnblogs.com/lujinhong2/p/4666748.html

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航