Spark配置文件详解
2016-03-19 13:36
429 查看
kwu --- Spark配置文件详解
1、主要环境配置文件 spark-env.sh
[plain] view
plain copy
export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
export SPARK_MASTER_IP=10.130.2.20 #1
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=24
export SPARK_EXECUTOR_INSTANCES=1
export SPARK_WORKER_MEMORY=48g
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf/
export SPARK_DAEMON_MEMORY=8G
#export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=bdc40.hexun.com:2181,bdc41.hexun.com:2181,bdc46.hexun.com:2181,bdc53.hexun.com:2181,bdc54.hexun.com:2181 -Dspark.deploy.zookeeper.dir=/spark" #2
#export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=FILESYSTEM -Dspark.deploy.recoveryDirectory=/opt/modules/spark/recovery" #3
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$HADOOP_HOME/lib/native
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native
export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:$HADOOP_HOME/lib/native
export SPARK_CLASSPATH=$SPARK_CLASSPATH:$HADOOP_HOME/lib/snappy-java-1.0.4.1.jar
说明 :
1) 单结点无容错的 standalone模式
2) 基于zookeeper的master的HA模式,故障自动切换
3) 基于文件系统的容错模式,故障后需手动重启
2、 基本配置文件 spark-defaults.conf
[plain] view
plain copy
spark.local.dir /diskb/sparktmp,/diskc/sparktmp,/diskd/sparktmp,/diske/sparktmp,/diskf/sparktmp,/diskg/sparktmp
spark.eventLog.enabled true
spark.eventLog.dir hdfs://nameservice1/spark-log
说明:
1) 配置本地日志文件
2) 配置基于HDFS的历史日志文件储存
3、从节点的配置 slaves
[plain] view
plain copy
spark1
spark2
spark3
说明:给定slave节点相应的主机名
1、主要环境配置文件 spark-env.sh
[plain] view
plain copy
export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
export SPARK_MASTER_IP=10.130.2.20 #1
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=24
export SPARK_EXECUTOR_INSTANCES=1
export SPARK_WORKER_MEMORY=48g
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf/
export SPARK_DAEMON_MEMORY=8G
#export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=bdc40.hexun.com:2181,bdc41.hexun.com:2181,bdc46.hexun.com:2181,bdc53.hexun.com:2181,bdc54.hexun.com:2181 -Dspark.deploy.zookeeper.dir=/spark" #2
#export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=FILESYSTEM -Dspark.deploy.recoveryDirectory=/opt/modules/spark/recovery" #3
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$HADOOP_HOME/lib/native
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native
export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:$HADOOP_HOME/lib/native
export SPARK_CLASSPATH=$SPARK_CLASSPATH:$HADOOP_HOME/lib/snappy-java-1.0.4.1.jar
说明 :
1) 单结点无容错的 standalone模式
2) 基于zookeeper的master的HA模式,故障自动切换
3) 基于文件系统的容错模式,故障后需手动重启
2、 基本配置文件 spark-defaults.conf
[plain] view
plain copy
spark.local.dir /diskb/sparktmp,/diskc/sparktmp,/diskd/sparktmp,/diske/sparktmp,/diskf/sparktmp,/diskg/sparktmp
spark.eventLog.enabled true
spark.eventLog.dir hdfs://nameservice1/spark-log
说明:
1) 配置本地日志文件
2) 配置基于HDFS的历史日志文件储存
3、从节点的配置 slaves
[plain] view
plain copy
spark1
spark2
spark3
说明:给定slave节点相应的主机名
相关文章推荐
- Spark RDD API详解(一) Map和Reduce
- 使用spark和spark mllib进行股票预测
- Spark随谈——开发指南(译)
- Spark,一种快速数据分析替代方案
- eclipse 开发 spark Streaming wordCount
- Understanding Spark Caching
- ClassNotFoundException:scala.PreDef$
- Windows 下Spark 快速搭建Spark源码阅读环境
- Spark中将对象序列化存储到hdfs
- Spark初探
- Spark Streaming初探
- Spark本地开发环境搭建
- 搭建hadoop/spark集群环境
- Spark HA部署方案
- Spark HA原理架构图
- spark内存概述
- Spark Shuffle之Hash Shuffle
- Spark Shuffle之Sort Shuffle
- Spark Shuffle之Tungsten Sort Shuffle
- 编译Spark 1.5.2