您的位置:首页 > 大数据 > 云计算

【互动问答分享】第6期决胜云计算大数据时代Spark亚太研究院公益大讲堂

2014-08-04 15:42 519 查看

“决胜云计算大数据时代”

Spark亚太研究院100期公益大讲堂 【第6期互动问答分享】

Q1:spark streaming
可以不同数据流 join吗?

Spark Streaming不同的数据流可以进行join操作;

Spark Streaming is an extension of the coreSpark API that allows enables high-throughput, fault-tolerant
stream processingof live data streams. Data can be ingested from many sources like Kafka, Flume,Twitter, ZeroMQ or plain old TCP sockets and be processed using complexalgorithms expressed with high-level functions like
map
,
reduce
,
join
and
window

join(otherStream, [numTasks]):When
called on twoDStreams of (K, V) and (K, W) pairs, return a new DStream of (K, (V, W)) pairswith all pairs of elements for each key.
Q2:flume
与 spark streaming 适合 集群模式吗?


Flume与Spark Streaming是为集群而生的;

For input streams that receive data over the network (suchas, Kafka, Flume, sockets, etc.), the default persistence level is set toreplicate the data to two nodes
for fault-tolerance.

Using any input source that receives datathrough a network -
Fornetwork-based data sources like Kafka and Flume, the received input data isreplicated in memory between nodes of the cluster (default replication factoris 2).

Q3:spark有缺点嘛?

Spark的核心缺点在于对内存的占用比较大;

在以前的版本中Spark对数据的处理主要的是粗粒度的,难以进行精细的控制;

后来加入Fair模式后可以进行细粒度的处理;

Q4:spark streming现在有生产使用吗?

Spark Streaming非常易于在生产环境下使用;

无需部署,只需安装好Spark,,就按照好了Spark Streaming;

国内像皮皮网等都在使用Spark Streaming;
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐