mongodb mapreduce 总结
2016-01-29 00:23
609 查看
mongodb mapreduce
官方详细说明地址:https://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/#mapreduce-map-mtd
1.语法结构:
详细说明:
干净概念:
在map reduce finalize函数中,函数里面应该是干净的,不能出现连接数据库的操作等,但是也可以使用一些函数,如下:
1.map
格式:
把每一个document、转变成0个或者多个emit。用来做mapreduce的初始数据
转变成0行的方式:
转变成多行的方式:
REQUIREMENTS:
1.在map函数中,this代表当前的document
2.不允许访问数据库
3.不能和外部的function进行交互
可以从 scope中取值。
4.emit里面的数据大小 最大为MongoDB’s
maximum BSONdocument size. 的一半大,The maximum BSON document size is 16 megabytes.
因此emit里面的数据不能超过8MB
5.一个document 可能得到0个,1个,多个 emit
2.reduce
REQUIREMENT:
1.不能访问数据库和外部函数
2.当一个key 只有一个数据的时候,reduce函数将不被执行,当前的值作为reduce的结果
3.reduce函数可能被调用多次,譬如在分片的情况下需要多次合并,因此,reduce的结果格式,是可以作为下一次reduce的传入数据。英文如下:
MongoDB can invoke the reduce function more than once for thesame key. In this case, the previous output from thereducefunction for that
key will become one of the input values to the nextreduce function invocation for that key.
4.可以从 scope中取值。
总之,reduce的结果格式,需要和map函数的emit部分的格式一致,这样才能多次自行reduce
3. OPTIONS
3.1 out 有两种格式
第一种格式,默认为
action的取值:
replace:整体替换,相当于如果这个collection存在,则清空,在插入结果
merge:如果插入的key结果在collection中存在,则会被覆盖,没有的继续存在
reduce:和collection中的结果合并,如果key存在,将使用reduce 将插入的数据和存在的数据进行reduce处理。
reduce比较适合cron隔断时间执行某个时间的数据,然后结果会合并起来,这样多次执行和一次执行的结果是一样的,这样的好处是可以实时的查看一部分数据。
db的取值:
默认是input的对应的数据库,这里可以自定output数据的库
sharded的取值:
设置为true为启用分片,您需要在output databse中enable sharding,mapreduce将把_id作为shard key将output collection放到不同的分片上。
nonAtomic的取值:
非原子的意思,默认为false,也就是原子性,mapreduce在执行的时候将锁表
只能应用 action为merge或reduce的时候才能设置为true
如果设置为ture,将不锁表,客户端访问有可能读取到output的中间数据。
3.2 finalize Function
不可以访问数据库和其他函数
可以访问scope中定义的参数
例子:
This operation uses the query field to select only thosedocuments withord_date greater than
newDate(01/01/2012). Then it output the results to a collectionmap_reduce_example. If themap_reduce_example
collectionalready exists, the operation will merge the existing contents withthe results of this map-reduce operation.
db.collection.mapReduce() takes the following parameters:
The following table describes additional arguments thatdb.collection.mapReduce() can accept.
官方详细说明地址:https://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/#mapreduce-map-mtd
1.语法结构:
db.collection.mapReduce( <map>, <reduce>, { out:<collection>, query:<document>, sort:<document>, limit:<number>, finalize:<function>, scope:<document>, jsMode:<bollean>, verbose:<bollean>, bypassDocumentValidation:<bollean> } );
详细说明:
干净概念:
在map reduce finalize函数中,函数里面应该是干净的,不能出现连接数据库的操作等,但是也可以使用一些函数,如下:
Available Properties args MaxKey MinKey Available Functions assert() BinData() DBPointer() DBRef() doassert() emit() gc() HexData() hex_md5() isNumber() isObject() ISODate() isString() Map() MD5() NumberInt() NumberLong() ObjectId() print() printjson() printjsononeline() sleep() Timestamp() tojson() tojsononeline() tojsonObject() UUID() version()
1.map
格式:
function() { ... emit(key, value); }
把每一个document、转变成0个或者多个emit。用来做mapreduce的初始数据
转变成0行的方式:
function() { if (this.status == 'A') emit(this.cust_id, 1); }
转变成多行的方式:
function() { this.items.forEach(function(item){ emit(item.sku, 1); }); }
REQUIREMENTS:
1.在map函数中,this代表当前的document
2.不允许访问数据库
3.不能和外部的function进行交互
可以从 scope中取值。
4.emit里面的数据大小 最大为MongoDB’s
maximum BSONdocument size. 的一半大,The maximum BSON document size is 16 megabytes.
因此emit里面的数据不能超过8MB
5.一个document 可能得到0个,1个,多个 emit
2.reduce
function(key, values) { ... return result; }
REQUIREMENT:
1.不能访问数据库和外部函数
2.当一个key 只有一个数据的时候,reduce函数将不被执行,当前的值作为reduce的结果
3.reduce函数可能被调用多次,譬如在分片的情况下需要多次合并,因此,reduce的结果格式,是可以作为下一次reduce的传入数据。英文如下:
MongoDB can invoke the reduce function more than once for thesame key. In this case, the previous output from thereducefunction for that
key will become one of the input values to the nextreduce function invocation for that key.
4.可以从 scope中取值。
总之,reduce的结果格式,需要和map函数的emit部分的格式一致,这样才能多次自行reduce
3. OPTIONS
3.1 out 有两种格式
out: <collectionName> out: { <action>: <collectionName> [, db: <dbName>] [, sharded: <boolean> ] [, nonAtomic: <boolean> ] }
第一种格式,默认为
out: { replace: <collectionName> [, db: <inputDB>] [, sharded: false ] [, nonAtomic: false ] }
action的取值:
replace:整体替换,相当于如果这个collection存在,则清空,在插入结果
merge:如果插入的key结果在collection中存在,则会被覆盖,没有的继续存在
reduce:和collection中的结果合并,如果key存在,将使用reduce 将插入的数据和存在的数据进行reduce处理。
reduce比较适合cron隔断时间执行某个时间的数据,然后结果会合并起来,这样多次执行和一次执行的结果是一样的,这样的好处是可以实时的查看一部分数据。
db的取值:
默认是input的对应的数据库,这里可以自定output数据的库
sharded的取值:
设置为true为启用分片,您需要在output databse中enable sharding,mapreduce将把_id作为shard key将output collection放到不同的分片上。
nonAtomic的取值:
非原子的意思,默认为false,也就是原子性,mapreduce在执行的时候将锁表
只能应用 action为merge或reduce的时候才能设置为true
如果设置为ture,将不锁表,客户端访问有可能读取到output的中间数据。
3.2 finalize Function
function(key, reducedValue) { ... return modifiedObject; }
不可以访问数据库和其他函数
可以访问scope中定义的参数
例子:
var mapFunction2 = function() { for (var idx = 0; idx < this.items.length; idx++) { var key = this.items[idx].sku; var value = { count: 1, qty: this.items[idx].qty }; emit(key, value); } };
var reduceFunction2 = function(keySKU, countObjVals) { reducedVal = { count: 0, qty: 0 }; for (var idx = 0; idx < countObjVals.length; idx++) { reducedVal.count += countObjVals[idx].count; reducedVal.qty += countObjVals[idx].qty; } return reducedVal; };
var finalizeFunction2 = function (key, reducedVal) { reducedVal.avg = reducedVal.qty/reducedVal.count; return reducedVal; };
db.orders.mapReduce( mapFunction2, reduceFunction2, { out: { merge: "map_reduce_example" }, query: { ord_date: { $gt: new Date('01/01/2012') } }, finalize: finalizeFunction2 } )
This operation uses the query field to select only thosedocuments withord_date greater than
newDate(01/01/2012). Then it output the results to a collectionmap_reduce_example. If themap_reduce_example
collectionalready exists, the operation will merge the existing contents withthe results of this map-reduce operation.
db.collection.mapReduce() takes the following parameters:
Field | Type | Description |
---|---|---|
map | function | A JavaScript function that associates or “maps” a value with akey and emits the key and value pair. See Requirements for the map Function for more information. |
reduce | function | A JavaScript function that “reduces” to a single object all thevalues associated with a particular key. See Requirements for the reduce Function for more information. |
options | document | A document that specifies additional parameters todb.collection.mapReduce(). |
bypassDocumentValidation | boolean | Optional. Enables mapReduce to bypass document validationduring the operation. This lets you insert documents that do notmeet the validation requirements. New in version 3.2. |
Field | Type | Description |
---|---|---|
out | string or document | Specifies the location of the result of the map-reduce operation.You can output to a collection, output to a collection with anaction, or output inline. You may output to a collection whenperforming map reduce operations on the primary members of the set;on secondary members you may only use the inline output. See out Options for more information. |
query | document | Specifies the selection criteria using query operators for determining the documents input to themap function. |
sort | document | Sorts the input documents. This option is useful foroptimization. For example, specify the sort key to be the same asthe emit key so that there are fewer reduce operations. The sort keymust be in an existing index for this collection. |
limit | number | Specifies a maximum number of documents for the input into themap function. |
finalize | function | Optional. Follows the reduce method and modifies the output. See Requirements for the finalize Function for more information. |
scope | document | Specifies global variables that are accessible in the map,reduce and finalize functions. |
jsMode | boolean | Specifies whether to convert intermediate data into BSONformat between the execution of the map and reducefunctions. Defaults to false. If false: Internally, MongoDB converts the JavaScript objects emittedby the mapfunction to BSON objects. These BSONobjects are then converted back to JavaScript objects whencalling the reduce function. The map-reduce operation places the intermediate BSON objectsin temporary, on-disk storage. This allows the map-reduceoperation to execute over arbitrarily large data sets. If true: Internally, the JavaScript objects emitted during mapfunction remain as JavaScript objects. There is no need toconvert the objects for the reduce function, whichcan result in faster execution. You can only use jsMode for result sets with fewer than500,000 distinct key arguments to the mapper’s emit()function. The jsMode defaults to false. |
verbose | Boolean | Specifies whether to include the timing information in theresult information. The verbose defaults to true to includethe timing information. |
相关文章推荐
- MongoDB数据库设计中6条重要的经验法则,part 1———转载
- MongoDB数据库设计中6条重要的经验法则,part 2———转载
- MongoDB数据库设计中6条重要的经验法则,part 3——转载
- MongoDB
- 学习MongoDB 二:MongoDB加入、删除、改动
- mongodb mapreduce 分片
- mongodb mapreduce 结果数据 与历史数据 再次合并
- mongodb mapreduce 参数的详细说明
- 比较MongoDB在公有云上的性能:AWS、Azure和Digital Ocean
- 三分钟了解mongodb
- mongodb学习1(基础知识)
- linux下mongodb安装和配置
- MongoDB基本用法
- windows下mongodb的安装
- mongodb的配置文件语法
- mongodb mapreduce 文档
- mongodb 字段验证规则 3.2新功能
- MongoDB学习整理之更新Mongodb更新有两个命令:一个是update,另一个是saveup
- mongodb备份还原
- spring mongodb用法