您的位置:首页 > 数据库 > Mongodb

MongoDB之Map-Reduce -- Mongo Shell版和C#版(上)

2013-08-15 15:59 281 查看


1. 官网:http://docs.mongodb.org/manual/tutorial/map-reduce-examples/

The map-reduce operation is composed of many tasks, including:

reads from the input collection,
executions of the map function,
executions of the reduce function,
writes to the output collection.


So what advantage does map reduce hold? The oft-cited benefit is that both the map and reduce operations can be distributed. So the code I've written above could be executed by multiple threads, multiple cpus, or even thousands of servers as-is. This is
key when dealing with millions and billions of records, or smaller sets with more complex logic. For the rest of us though, I think the real benefit is the power of being able to write these types of transforms using actual programming languages, with variables,
conditional statements, methods and so on. It is a mind shift from the traditional approach, but I do think even slightly complex queries are cleaner and easier to write with map reduce. We didn't look at it here, but you'll commonly feed the output of a reduce
function into another reduce function - each function further transforming it towards the end-result.

好的,官网上说的是map-reduce在执行的时候包括哪些操作,步骤;另一篇文章说得是map-reduce有什么好处相对于传统的group by之类操作,而且还有他自己的见解,大家可以看看这篇文章。




{ cusid:1, price:15 };

{ cusid:2, price:30 };

{ cusid:2, price:45 };

{ cusid:3, price:45 };

{ cusid:4, price:5 };

{ cusid:5, price:65 };

{ cusid:1, price:10 };

{ cusid:1, price:30 };

{ cusid:5, price:30 };

{ cusid:4, price:100 };


但是我们想要得到的数据是根据cusid统计price的总和,这个可以利用group by来实现,但是前面的2个引用说了map-reduce的优势,尤其是大数据的时候,优势会很明显,那么我们就用map-reduce来实现,输出数据如下:



{ cusid:1, price:55 };

{ cusid:2, price:75 };

{ cusid:3, price:45 };

{ cusid:4, price:105 };

{ cusid:5, price:95 };



一. Mongo Shell 版本:

1. 首先我们编写map function来处理每一个Document(其实就是编写js脚本,但是又不同)。


2. 编写对应的reduce function,这里的functio有2个参数,key-values,对,不是key-value,values是一个数组,这里相当于做了一个group操作,全部对应一个cusid,cusid-prices。


3. 执行map-reduce操作,并输出结果到一个临时的Collection中去。

红色部分即为我们要的结果,与Output结果一致。这里有一点疑问,就是cusid = 3的时候,结果的格式与其他的不一样,猜测可能是因为当cusid = 3的记录只有一条,所以就不会做类似group的操作,简言之就不会执行reduceFunction了,如果想要验证这个猜测,我们可以在插入一条cusid = 3的记录,看看结果是否会变化。

事实证明猜测是正确的,cusid = 3的结果和其他的一致了。~_~

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息