网上搜集的storm有用资料 2
2012-09-24 14:52
344 查看
We are looking for a tool that can replace our traditional log processing model of saving activity streams into log files. Seems like the new & popular way is to use Publish/Subscribe model. Could Storm be ideal for this? Has anyone used it for this purpose?
We are thinking of evaluating the following 4 tools:
Kafka (LinkedIn) : Looks promising, but it's written in Scala. We have heard mixed reports about Scala so a bit concerned about its future.
Flume: Will be replaced by Flume NG which is NOT production ready. Not clear when it will be.
Scribe (FB): Not under active development. Will be replaced by Calligraphus - no idea when?
Storm (Twitter) : Looks promising, but not clear if it is designed for log processing in mind, although, I can't see why it can't be used for that purpose.
Storm + Kafka is a very effective log processing solution. A number of users of Storm use this combination, including us at Twitter in a few instances. Kafka gives you a high throughput, reliable way to persist/replay log
messages, and Storm gives you the ability to process those messages in arbitrarily complex ways.
We've been developing logging and reporting solutions on top of Storm which archives and streams logging information. Further, the ability of Storm to add another stream for the exceptional case has been key to making our logging infrastructure useful. I'd
highly recommend it, whether you use Kafka or you use AMQP or even direct syslog traffic at a spout. A custom Log4j appender is easy to write.
Kafka being written in Scala is not cause for concern, Kafka is well-engineered for its use case and functions reliably. That being said, I wouldn't personally choose to write a project in Scala myself, but using software written in Scala is another matter.
It sounds like Storm by itself is not enough to do the log processing. A tool such as Kafka is needed for persistence. I guess then Storm can be used as a 'Consumer'?
Pardon my naive question but what functionality does Storm provide that's not built into Kafka? Sounds to me like we will have to maintain a cluster of machines for Kafka + a cluster of machines for Storm (plus our existing Hadoop cluster). Trying to figure
out if so many layers are indeed needed.
Storm is then used as it's marketed: a distributed stream processor. It will do whatever you need to do to actually process the logs (conditionally filter, extract text, etc, etc) in a distributed manner. Log processing is a really good use case for Storm,
since typically there are a LOT of logs - it is truly a real time big data problem. So, instead of centralizing the logs and churning over the data using MapReduce, you're doing that work as streams within a Storm cluster...and your output is what you would
normally output from your M/R algorithms.
We are thinking of evaluating the following 4 tools:
Kafka (LinkedIn) : Looks promising, but it's written in Scala. We have heard mixed reports about Scala so a bit concerned about its future.
Flume: Will be replaced by Flume NG which is NOT production ready. Not clear when it will be.
Scribe (FB): Not under active development. Will be replaced by Calligraphus - no idea when?
Storm (Twitter) : Looks promising, but not clear if it is designed for log processing in mind, although, I can't see why it can't be used for that purpose.
Storm + Kafka is a very effective log processing solution. A number of users of Storm use this combination, including us at Twitter in a few instances. Kafka gives you a high throughput, reliable way to persist/replay log
messages, and Storm gives you the ability to process those messages in arbitrarily complex ways.
We've been developing logging and reporting solutions on top of Storm which archives and streams logging information. Further, the ability of Storm to add another stream for the exceptional case has been key to making our logging infrastructure useful. I'd
highly recommend it, whether you use Kafka or you use AMQP or even direct syslog traffic at a spout. A custom Log4j appender is easy to write.
Kafka being written in Scala is not cause for concern, Kafka is well-engineered for its use case and functions reliably. That being said, I wouldn't personally choose to write a project in Scala myself, but using software written in Scala is another matter.
It sounds like Storm by itself is not enough to do the log processing. A tool such as Kafka is needed for persistence. I guess then Storm can be used as a 'Consumer'?
Pardon my naive question but what functionality does Storm provide that's not built into Kafka? Sounds to me like we will have to maintain a cluster of machines for Kafka + a cluster of machines for Storm (plus our existing Hadoop cluster). Trying to figure
out if so many layers are indeed needed.
Storm is then used as it's marketed: a distributed stream processor. It will do whatever you need to do to actually process the logs (conditionally filter, extract text, etc, etc) in a distributed manner. Log processing is a really good use case for Storm,
since typically there are a LOT of logs - it is truly a real time big data problem. So, instead of centralizing the logs and churning over the data using MapReduce, you're doing that work as streams within a Storm cluster...and your output is what you would
normally output from your M/R algorithms.
相关文章推荐
- 网上搜集的storm 一些有用的资料
- storm网上中文资料搜集大全
- storm网上中文资料搜集大全
- storm 网上中文资料搜集大全
- 网上搜集的有用资料备忘
- java中图片显示-网上搜集的资料
- Android本地数据安全问题,常用加密简单汇总,网上搜集的资料,方便查看
- 网上搜集的一些资料系统分析师考试资料
- 关于状态机 一段式 二段式 三段式 (网上资料搜集)
- 关于状态机 一段式 二段式 三段式 (网上资料搜集)
- openwrt 网上资料搜集
- 从协议网上搜集的SMTP协议的一些资料以及命令
- 单播、多播、广播的区别(看到网上有用的搜集)
- 用PNG显示不规则窗口关键代码(网上搜集资料整理)方便查找
- 从协议网上搜集的SMTP协议的一些资料以及命令
- 子分类账知识学习(汇总网上比较有用的资料)
- 从协议网上搜集的SMTP协议的一些资料以及命令
- 子分类账知识学习(汇总网上比较有用的资料)
- 托管和非托管的区别(网上搜集的资料)
- mrtg 的配置和使用(从网上搜集的资料,自己整理过的)