spark streaming的容错:防止数据丢失
2015-05-14 16:11
176 查看
官方这么说的
我理解,当worker或者driver挂掉后,可能会将receive的数据丢失,那么官方给的方案就是将接受的数据checkpoint到本地。
通过使用spark.streaming.receiver.writeAheadLog.enable=true来启用。 另外,如果启动这个的话, 那么streaming的存储策略就没有必要多个复本了,官方推荐使用StorageLevel.MEMORY_AND_DISK_SER即可
[Since Spark 1.2] Configuring write ahead logs - Since Spark 1.2, we have introduced write ahead logs for achieving strong fault-tolerance guarantees. If enabled, all the data received from a receiver gets written into a write ahead log in the configuration checkpoint directory. This prevents data loss on driver recovery, thus ensuring zero data loss (discussed in detail in the Fault-tolerance Semantics section). This can be enabled by setting the configuration parameter spark.streaming.receiver.writeAheadLog.enable to true. However, these stronger semantics may come at the cost of the receiving throughput of individual receivers. This can be corrected by running more receivers in parallel to increase aggregate throughput. Additionally, it is recommended that the replication of the received data within Spark be disabled when the write ahead log is enabled as the log is already stored in a replicated storage system. This can be done by setting the storage level for the input stream to StorageLevel.MEMORY_AND_DISK_SER.
我理解,当worker或者driver挂掉后,可能会将receive的数据丢失,那么官方给的方案就是将接受的数据checkpoint到本地。
通过使用spark.streaming.receiver.writeAheadLog.enable=true来启用。 另外,如果启动这个的话, 那么streaming的存储策略就没有必要多个复本了,官方推荐使用StorageLevel.MEMORY_AND_DISK_SER即可
相关文章推荐
- Spark Streaming容错的改进和零数据丢失
- Spark Streaming的容错和数据无丢失机制
- Spark Streaming容错的改进和零数据丢失
- Spark Streaming的容错和数据无丢失机制
- Spark Streaming容错的改进和零数据丢失
- Spark Streaming容错的改进和零数据丢失
- Spark Streaming容错的改进和零数据丢失
- Spark Streaming容错的改进和零数据丢失
- spark-streaming 读取kafka数据不丢失(一)
- spark streaming读取kafka数据令丢失(二)
- Spark Streaming使用Kafka保证数据零丢失
- 160728、Spark Streaming kafka 实现数据零丢失的几种方式
- Spark Streaming消费Kafka Direct方式数据零丢失实现
- Spark Streaming kafka实现数据零丢失的几种方式
- Spark Streaming使用Kafka保证数据零丢失
- Spark Streaming 之 consumer offsets 保存到 Zookeeper 以实现数据零丢失
- Spark Streaming使用Kafka保证数据零丢失
- Spark Streaming和Kafka整合保证数据零丢失
- Spark Streaming使用Kafka保证数据零丢失
- sparkstreaming + kafka如何保证数据不丢失、不重复