您的位置:首页 > 其它

map and flatmap 区别

2016-06-11 00:00 330 查看

map vs flatMap in Spark

September 24, 2014Big Dataexample, spark

In the previous blogs around Spark examples, RDD.flatMap() has been used. In this blog we will look at the differences between RDD.map() and RDD.flatMap().

map and flatMap are similar, in the sense they take a line from the input RDD and apply a function on it. The way they differ is that the function in map returns only one element, while function in flatMap can return a list of elements (0 or more) as an iterator.

Also, the output of the flatMap is flattened. Although the function in flatMap returns a list of elements, the flatMap returns an RDD which has all the elements from the list in a flat way (not a list).

Sounds a bit confusing. In the below code snippet, on the input lines both map and flatMap are applied and output dumped in HDFS to wordsWithMap and wordsWithFlatMap folder.

from pyspark import SparkContext sc = SparkContext("spark://bigdata-vm:7077", "Map") lines = sc.parallelize(["hello world", "hi"]) wordsWithMap = lines.map(lambda line: line.split(" ")).coalesce(1) wordsWithFlatMap = lines.flatMap(lambda line: line.split(" ")).coalesce(1) wordsWithMap.saveAsTextFile("hdfs://localhost:9000/user/bigdatavm/wordsWithMap") wordsWithFlatMap.saveAsTextFile("hdfs://localhost:9000/user/bigdatavm/wordsWithFlatMap")

1
2
3
4
5
6
7
8
9
10
from pyspark import SparkContext

sc = SparkContext("spark://bigdata-vm:7077", "Map")
lines = sc.parallelize(["hello world", "hi"])

wordsWithMap = lines.map(lambda line: line.split(" ")).coalesce(1)
wordsWithFlatMap = lines.flatMap(lambda line: line.split(" ")).coalesce(1)

wordsWithMap.saveAsTextFile("hdfs://localhost:9000/user/bigdatavm/wordsWithMap")
wordsWithFlatMap.saveAsTextFile("hdfs://localhost:9000/user/bigdatavm/wordsWithFlatMap")

The output of the map function in HDFS


The output of the flatMap function in HDFS

Conclusion

The input function to map returns a single element, while the flatMap returns a list of elements (0 or more). And also, the output of the flatMap is flattened.

In the case of word count, where the input line is split into multiple words, flatMap can be used. Also, in the case of weather data set, the extractData nethod will validate the record and might or might not return a value. In this case
7fe0
also, flatMap can be used.

Share this:

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: