spark 2.1 on yarn -- container shell analysis
2017-05-24 14:20
447 查看
I set the following content in spark-defaults.conf
When execute spark-shell, it will create two executors.
look at the command of one executor.
Look the container directory.
Open the spark configuration, you can see spark.executor.id=driver, and from
Open launch_container.sh, you can see
spark.serializer org.apache.spark.serializer.KryoSerializer spark.master yarn spark.executor.instances 2 spark.executor.cores 1 spark.executor.memory 512m
When execute spark-shell, it will create two executors.
jps 32412 CoarseGrainedExecutorBackend 32444 CoarseGrainedExecutorBackend
look at the command of one executor.
]$ ps aux | grep 32412 houzhiz+ 374 0.0 0.0 112668 976 pts/1 R+ 14:08 0:00 grep --color=auto 32412 houzhiz+ 32412 15.1 4.3 2371448 342156 ? Sl 14:03 0:46 /usr/local/java/bin/java -server -Xmx512m -Djava.io.tmpdir=/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/appcache/application_1495532285542_0005/container_1495532285542_0005_01_000002/tmp -Dspark.driver.port=35736 -Dspark.yarn.app.container.log.dir=/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2/logs/userlogs/application_1495532285542_0005/container_1495532285542_0005_01_000002 -XX:OnOutOfMemoryError=kill %p org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@192.168.122.1:35736 --executor-id 1 --hostname localhost --cores 1 --app-id application_1495532285542_0005 --user-class-path file:/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/appcache/application_1495532285542_0005/container_1495532285542_0005_01_000002/__app__.jar
Look the container directory.
$ cd /data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/appcache/application_1495532285542_0005/container_1495532285542_0005_01_000002 [houzhizhen@localhost container_1495532285542_0005_01_000002]$ ll 总用量 20 -rw-rw-r--. 1 houzhizhen houzhizhen 86 5月 24 14:03 container_tokens -rwx------. 1 houzhizhen houzhizhen 703 5月 24 14:03 default_container_executor_session.sh -rwx------. 1 houzhizhen houzhizhen 757 5月 24 14:03 default_container_executor.sh -rwx------. 1 houzhizhen houzhizhen 3590 5月 24 14:03 launch_container.sh lrwxrwxrwx. 1 houzhizhen houzhizhen 89 5月 24 14:03 __spark_conf__ -> /data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/filecache/17/__spark_conf__.zip lrwxrwxrwx. 1 houzhizhen houzhizhen 108 5月 24 14:03 __spark_libs__ -> /data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/filecache/16/__spark_libs__7172508084572895679.zip drwx--x---. 2 houzhizhen houzhizhen 6 5月 24 14:03 tmp [houzhizhen@localhost container_1495532285542_0005_01_000002]$
Open the spark configuration, you can see spark.executor.id=driver, and from
__spark_conf__ -> /data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/filecache/17/__spark_conf__.zip, so it can safely conclude that the configure file is shared across executors of the same spark application.
cat __spark_conf__/__spark_conf__.properties #Spark configuration. #Wed May 24 14:03:27 CST 2017 spark.yarn.cache.visibilities=PRIVATE spark.yarn.cache.timestamps=1495605805866 spark.executor.memory=512m spark.executor.id=driver spark.driver.host=192.168.122.1 spark.yarn.cache.confArchive=hdfs\://localhost\:8020/user/houzhizhen/.sparkStaging/application_1495532285542_0005/__spark_conf__.zip spark.files.ignoreCorruptFiles=true spark.yarn.cache.sizes=200756074 spark.jars= spark.sql.catalogImplementation=hive spark.home=/usr/local/spark spark.submit.deployMode=client spark.executor.heartbeatInterval=2 spark.master=yarn spark.yarn.cache.filenames=hdfs\://localhost\:8020/user/houzhizhen/.sparkStaging/application_1495532285542_0005/__spark_libs__7172508084572895679.zip\#__spark_libs__ spark.executor.cores=1 spark.yarn.cache.types=ARCHIVE spark.driver.appUIAddress=http\://192.168.122.1\:4040 spark.serializer=org.apache.spark.serializer.KryoSerializer spark.repl.class.outputDir=/tmp/spark-caaf86f0-267d-4b39-9bfe-833d97db838e/repl-e03f92dd-176d-42b5-9ebd-a1e3d66c7e1c spark.executor.instances=2 spark.app.name=Spark shell spark.repl.class.uri=spark\://192.168.122.1\:35736/classes spark.driver.port=35736
Open launch_container.sh, you can see
$PWD/__spark_conf__:$PWD/__spark_libs__/*is included in the CLASSPATH. From the last command, it can see the executor-id is override with
--executor-id 1
launch_container.sh
cat launch_container.sh #!/bin/bash export SPARK_YARN_STAGING_DIR="hdfs://localhost:8020/user/houzhizhen/.sparkStaging/application_1495532285542_0005" export HADOOP_CONF_DIR="/usr/local/hadoop/etc/hadoop" export JAVA_HOME="/usr/local/java" export SPARK_LOG_URL_STDOUT="http://localhost:8042/node/containerlogs/container_1495532285542_0005_01_000002/houzhizhen/stdout?start=-4096" export NM_HOST="localhost" export SPARK_HOME="/usr/local/spark" export HADOOP_HDFS_HOME="/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2" export LOGNAME="houzhizhen" export JVM_PID="$$" export PWD="/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/appcache/application_1495532285542_0005/container_1495532285542_0005_01_000002" export HADOOP_COMMON_HOME="/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2" export LOCAL_DIRS="/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/appcache/application_1495532285542_0005" export NM_HTTP_PORT="8042" export LOG_DIRS="/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2/logs/userlogs/application_1495532285542_0005/container_1495532285542_0005_01_000002" export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= " export NM_PORT="33996" export USER="houzhizhen" export HADOOP_YARN_HOME="/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2" export CLASSPATH="$PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$HADOOP_YARN_HOME/share/hadoop/yarn/*:$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*" export SPARK_YARN_MODE="true" export HADOOP_TOKEN_FILE_LOCATION="/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/appcache/application_1495532285542_0005/container_1495532285542_0005_01_000002/container_tokens" export SPARK_USER="houzhizhen" export SPARK_LOG_URL_STDERR="http://localhost:8042/node/containerlogs/container_1495532285542_0005_01_000002/houzhizhen/stderr?start=-4096" export HOME="/home/" export CONTAINER_ID="container_1495532285542_0005_01_000002" export MALLOC_ARENA_MAX="4" ln -sf "/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/filecache/17/__spark_conf__.zip" "__spark_conf__" hadoop_shell_errorcode=$? if [ $hadoop_shell_errorcode -ne 0 ] then exit $hadoop_shell_errorcode fi ln -sf "/data/hadoop/data11/tmp/nm-local-dir/usercache/houzhizhen/filecache/16/__spark_libs__7172508084572895679.zip" "__spark_libs__" hadoop_shell_errorcode=$? if [ $hadoop_shell_errorcode -ne 0 ] then exit $hadoop_shell_errorcode fi exec /bin/bash -c "$JAVA_HOME/bin/java -server -Xmx512m -Djava.io.tmpdir=$PWD/tmp '-Dspark.driver.port=35736' -Dspark.yarn.app.container.log.dir=/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2/logs/userlogs/application_1495532285542_0005/container_1495532285542_0005_01_000002 -XX:OnOutOfMemoryError='kill %p' org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@192.168.122.1:35736 --executor-id 1 --hostname localhost --cores 1 --app-id application_1495532285542_0005 --user-class-path file:$PWD/__app__.jar 1>/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2/logs/userlogs/application_1495532285542_0005/container_1495532285542_0005_01_000002/stdout 2>/home/houzhizhen/usr/local/hadoop/hadoop-2.7.2/logs/userlogs/application_1495532285542_0005/container_1495532285542_0005_01_000002/stderr" hadoop_shell_errorcode=$? if [ $hadoop_shell_errorcode -ne 0 ] then exit $hadoop_shell_errorcode fi
相关文章推荐
- spark 2.1 task allocation on yarn cluster
- spark-shell on yarn 出错解决【启动命令bin/spark-shell --master yarn-client出现错误,类ExecutorLauncher 找不到】
- [Apache Spark On Yarn Resource Analysis]
- How to access HBase from spark-shell using YARN as the master on CDH 5.3 and Spark 1.2
- spark on yarn:Container is running beyond physical memory limits
- spark-shell on yarn 出错(arn application already ended,might be killed or not able to launch applic)解决
- spark on yarn 报 org.apache.hadoop.util.Shell$ExitCodeException: 问题
- SparkSQL On Yarn with Hive,操作和访问Hive表
- Spark On Yarn:提交Spark应用程序到Yarn
- spark on yarn中yarn-cluster与yarn-client区别
- Spark的Python和Scala shell介绍(翻译自Learning.Spark.Lightning-Fast.Big.Data.Analysis)
- idea+maven+scala创建wordcount,打包jar并在spark on yarn上运行(可以使用)
- spark-03-spark on yarn
- Spark源码系列(七)Spark on yarn具体实现
- Spark on Yarn+Hbase环境搭建指南(四)NTP服务设置
- spark-shell启动报错:Yarn application has already ended! It might have been killed or unable to launch application master
- Spark ON YARN 官方中文版
- Spark On Yarn环境搭建
- Spark On Yarn集群环境搭建
- sparkR在spark on yarn下的问题