您的位置:首页 > 其它

【Spark系列6】spark submit提交任务

2017-08-29 19:05 483 查看
根据spark官网,在提交任务的时候指定–jars,用逗号分开。这样做的缺点是每次都要指定jar包,如果jar包少的话可以这么做,但是如果多的话会很麻烦。

spark-submit --master yarn-client --executor-memory 3g --executor-cores 2 --num-executors 2 --jars ***.jar,***.jar(你的jar包,用逗号分隔) mysparksubmit.jar


关于master的值

(1)对于standalone模式,是spark://ip:port/的形式

(2)对于yarn,有yarn-cluster与yarn-cluster2种

(3)对于mesos,目前只有client选项

(4)除此之外,还有local
这种用于本地调试的选项

Master URLMeaning
localRun Spark locally with one worker thread (i.e. no parallelism at all).
local[K]Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine).
local[*]Run Spark locally with as many worker threads as logical cores on your machine.
spark://HOST:PORTConnect to the given Spark standalone cluster master. The port must be whichever one your master is configured to use, which
is 7077 by default.
mesos://HOST:PORTConnect to the given Mesos cluster. The port must be whichever one your is configured to use, which is 5050 by default. Or,
for a Mesos cluster using ZooKeeper, use 
mesos://zk://...
.
yarn-clientConnect to a YARN cluster in client mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR variable.
yarn-clusterConnect to a YARN cluster in cluster mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR variable.
关于client与cluster模式

A common deployment strategy is to submit your application from
a gateway machine that is physically co-located with your worker machines (e.g. Master node in a standalone EC2 cluster). In this setup, 
client
 mode
is appropriate. In 
client
 mode,
the driver is launched directly within the 
spark-submit
 process
which acts as a client to
the cluster. The input and output of the application is attached to the console. Thus, this mode is especially suitable for applications that involve the REPL (e.g. Spark shell).

Alternatively, if your application is submitted from a machine far from the worker machines (e.g. locally on your laptop), it is common to use
cluster
 mode
to minimize network latency between the drivers and the executors. Note that 
cluster
 mode is currently not supported for Mesos clusters.
Currently only YARN supports cluster mode for Python applications.
http://www.cnblogs.com/lujinhong2/p/4666748.html
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: