map and flatmap 区别
2016-06-11 00:00
330 查看
map vs flatMap in Spark
September 24, 2014Big Dataexample, sparkIn the previous blogs around Spark examples, RDD.flatMap() has been used. In this blog we will look at the differences between RDD.map() and RDD.flatMap().
map and flatMap are similar, in the sense they take a line from the input RDD and apply a function on it. The way they differ is that the function in map returns only one element, while function in flatMap can return a list of elements (0 or more) as an iterator.
Also, the output of the flatMap is flattened. Although the function in flatMap returns a list of elements, the flatMap returns an RDD which has all the elements from the list in a flat way (not a list).
Sounds a bit confusing. In the below code snippet, on the input lines both map and flatMap are applied and output dumped in HDFS to wordsWithMap and wordsWithFlatMap folder.
from pyspark import SparkContext sc = SparkContext("spark://bigdata-vm:7077", "Map") lines = sc.parallelize(["hello world", "hi"]) wordsWithMap = lines.map(lambda line: line.split(" ")).coalesce(1) wordsWithFlatMap = lines.flatMap(lambda line: line.split(" ")).coalesce(1) wordsWithMap.saveAsTextFile("hdfs://localhost:9000/user/bigdatavm/wordsWithMap") wordsWithFlatMap.saveAsTextFile("hdfs://localhost:9000/user/bigdatavm/wordsWithFlatMap")
1 2 3 4 5 6 7 8 9 10 | from pyspark import SparkContext sc = SparkContext("spark://bigdata-vm:7077", "Map") lines = sc.parallelize(["hello world", "hi"]) wordsWithMap = lines.map(lambda line: line.split(" ")).coalesce(1) wordsWithFlatMap = lines.flatMap(lambda line: line.split(" ")).coalesce(1) wordsWithMap.saveAsTextFile("hdfs://localhost:9000/user/bigdatavm/wordsWithMap") wordsWithFlatMap.saveAsTextFile("hdfs://localhost:9000/user/bigdatavm/wordsWithFlatMap") |
The output of the map function in HDFS
The output of the flatMap function in HDFS
Conclusion
The input function to map returns a single element, while the flatMap returns a list of elements (0 or more). And also, the output of the flatMap is flattened.In the case of word count, where the input line is split into multiple words, flatMap can be used. Also, in the case of weather data set, the extractData nethod will validate the record and might or might not return a value. In this case
7fe0
also, flatMap can be used.
Share this:
相关文章推荐
- 对列表的筛选处理
- MMA 列表元素的选择
- scikit-learn使用笔记与sign prediction简单小结
- JS处理URL详解图
- C++--Qt使用Http协议
- Ajax获取PHP服务器上的 Json数据以及POST的综合实例
- 子元素浮动父容器高度不能自适应的CSS解决方法
- HttpServletRequest获取URL get变量
- wxWidgets编译运行 helloworld
- 保存修改后的Docker容器
- C++包含关系实现has-a(面向对象)例子
- C++数据结构--线性表 例子2
- SQL关系型数据库语句大全
- SpringMVC POST乱码和restful
- Struts2自定义登录验证拦截器
- Qt的Socket数据通讯的一个例子。
- Java多线程之Thread继承的用法
- 批量添加文件到git仓库暂存区
- C++ Qt多线程 TcpSocket服务器实例
- Docker查看 启动容器