spark-0.8.0源码剖析storage
2013-12-07 22:52
337 查看
BlockManagerMasterActor和BlockManagerSlaveActor进行管理和通信
1DiskStore
此处为配置spark.local.dir中的机制,每个块为一个文件并且根据块号哈希进哪个文件夹中
private def getFile(blockId: String): File = {
logDebug("Getting file for block " + blockId)
// Figure out which local directory it hashes to, and which subdirectory in that
val hash = Utils.nonNegativeHash(blockId)
val dirId = hash % localDirs.length
val subDirId = (hash / localDirs.length) % subDirsPerLocalDir
2storagelevel可以了解数据存储 的 类型和搭配
class StorageLevel private(
private var useDisk_ : Boolean,
private var useMemory_ : Boolean,
private var deserialized_ : Boolean,
private var replication_ : Int = 1)
extends Externalizable {
// TODO: Also add fields for caching priority, dataset ID, and flushing.
private def this(flags: Int, replication: Int) {
this((flags & 4) != 0, (flags & 2) != 0, (flags & 1) != 0, replication)
}
def this() = this(false, true, false) // For deserialization
def useDisk = useDisk_
def useMemory = useMemory_
def deserialized = deserialized_ //是否序列化
def replication = replication_ //副本数
4
1DiskStore
此处为配置spark.local.dir中的机制,每个块为一个文件并且根据块号哈希进哪个文件夹中
private def getFile(blockId: String): File = {
logDebug("Getting file for block " + blockId)
// Figure out which local directory it hashes to, and which subdirectory in that
val hash = Utils.nonNegativeHash(blockId)
val dirId = hash % localDirs.length
val subDirId = (hash / localDirs.length) % subDirsPerLocalDir
2storagelevel可以了解数据存储 的 类型和搭配
class StorageLevel private(
private var useDisk_ : Boolean,
private var useMemory_ : Boolean,
private var deserialized_ : Boolean,
private var replication_ : Int = 1)
extends Externalizable {
// TODO: Also add fields for caching priority, dataset ID, and flushing.
private def this(flags: Int, replication: Int) {
this((flags & 4) != 0, (flags & 2) != 0, (flags & 1) != 0, replication)
}
def this() = this(false, true, false) // For deserialization
def useDisk = useDisk_
def useMemory = useMemory_
def deserialized = deserialized_ //是否序列化
def replication = replication_ //副本数
4
相关文章推荐
- spark-0.8.0源码剖析--主流程
- spark-0.8.0源码剖析-分区Partitioner
- spark-0.8.0源码剖析-stage的建立--宽依赖和窄依赖
- spark-0.8.0源码剖析--standalone模式集群并行和单机并行
- spark源码剖析-总论
- Spark2.0.X源码深度剖析之 TaskScheduler之Task划分 —— 国内全网最新最全最具深度!!!
- Spark内核源码深度剖析:宽依赖与窄依赖深度剖析
- 精通Spark:Spark内核剖析、源码解读、性能优化和商业案例实战
- Spark2.0.X源码深度剖析之 Spark Submit..
- Spark源码剖析——SparkContext的初始化(七)_TaskScheduler的启动
- 大数据Spark “蘑菇云”行动第79课:Spark GraphX 代码实战及源码剖析
- Spark内核源码深度剖析:基于Yarn的两种提交模式深度剖析
- shark0.8.0源码剖析-主流程
- Spark源码剖析——SparkContext的初始化(六)_创建和启动DAGScheduler
- [Spark源码剖析] Task的调度与执行源码剖析
- Spark内核源码深度剖析:sparkContext初始化的源码核心
- Spark2.0.X源码深度剖析之 SparkEnv
- SparkContext源码深入剖析
- Spark源码剖析——SparkContext的初始化(五)_创建任务调度器TaskScheduler
- Spark源码剖析——SparkContext的初始化(四)_Hadoop相关配置及Executor环境变量