您的位置:首页 > 数据库 > Mongodb

mongodb mapreduce 总结

2016-01-29 00:23 609 查看
mongodb  mapreduce

官方详细说明地址:https://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/#mapreduce-map-mtd

1.语法结构:

db.collection.mapReduce(
<map>,
<reduce>,
{
out:<collection>,

query:<document>,

sort:<document>,

limit:<number>,

finalize:<function>,

scope:<document>,

jsMode:<bollean>,

verbose:<bollean>,

bypassDocumentValidation:<bollean>

}

);


详细说明:

干净概念:

在map reduce finalize函数中,函数里面应该是干净的,不能出现连接数据库的操作等,但是也可以使用一些函数,如下:

Available Properties
args
MaxKey
MinKey

Available Functions
assert()
BinData()
DBPointer()
DBRef()
doassert()
emit()
gc()
HexData()
hex_md5()
isNumber()
isObject()
ISODate()
isString()

Map()
MD5()
NumberInt()
NumberLong()
ObjectId()
print()
printjson()
printjsononeline()
sleep()
Timestamp()
tojson()
tojsononeline()
tojsonObject()
UUID()
version()


1.map

格式:

function() {
...
emit(key, value);
}


把每一个document、转变成0个或者多个emit。用来做mapreduce的初始数据

转变成0行的方式:

function() {
if (this.status == 'A')
emit(this.cust_id, 1);
}


转变成多行的方式:

function() {
this.items.forEach(function(item){ emit(item.sku, 1); });
}


REQUIREMENTS:

1.在map函数中,this代表当前的document

2.不允许访问数据库

3.不能和外部的function进行交互

可以从 scope中取值。

4.emit里面的数据大小 最大为MongoDB’s
maximum BSONdocument size. 的一半大,The maximum BSON document size is 16 megabytes.

因此emit里面的数据不能超过8MB

5.一个document 可能得到0个,1个,多个 emit

2.reduce

function(key, values) {
...
return result;
}


REQUIREMENT:

1.不能访问数据库和外部函数

2.当一个key 只有一个数据的时候,reduce函数将不被执行,当前的值作为reduce的结果

3.reduce函数可能被调用多次,譬如在分片的情况下需要多次合并,因此,reduce的结果格式,是可以作为下一次reduce的传入数据。英文如下:

MongoDB can invoke the reduce function more than once for thesame key. In this case, the previous output from thereducefunction for that
key will become one of the input values to the nextreduce function invocation for that key.

4.可以从 scope中取值。

总之,reduce的结果格式,需要和map函数的emit部分的格式一致,这样才能多次自行reduce

3. OPTIONS

3.1 out 有两种格式

out: <collectionName>

out: { <action>: <collectionName>
[, db: <dbName>]
[, sharded: <boolean> ]
[, nonAtomic: <boolean> ] }


第一种格式,默认为
out: { replace: <collectionName>
[, db: <inputDB>]
[, sharded: false ]
[, nonAtomic: false ] }


action的取值:

   replace:整体替换,相当于如果这个collection存在,则清空,在插入结果

   merge:如果插入的key结果在collection中存在,则会被覆盖,没有的继续存在

   reduce:和collection中的结果合并,如果key存在,将使用reduce 将插入的数据和存在的数据进行reduce处理。

reduce比较适合cron隔断时间执行某个时间的数据,然后结果会合并起来,这样多次执行和一次执行的结果是一样的,这样的好处是可以实时的查看一部分数据。

db的取值:

默认是input的对应的数据库,这里可以自定output数据的库

sharded的取值:

设置为true为启用分片,您需要在output databse中enable sharding,mapreduce将把_id作为shard key将output collection放到不同的分片上。

nonAtomic的取值:

非原子的意思,默认为false,也就是原子性,mapreduce在执行的时候将锁表

只能应用 action为merge或reduce的时候才能设置为true

如果设置为ture,将不锁表,客户端访问有可能读取到output的中间数据。

3.2 finalize Function

function(key, reducedValue) {
...
return modifiedObject;
}


不可以访问数据库和其他函数

可以访问scope中定义的参数

例子:

var mapFunction2 = function() {
for (var idx = 0; idx < this.items.length; idx++) {
var key = this.items[idx].sku;
var value = {
count: 1,
qty: this.items[idx].qty
};
emit(key, value);
}
};

var reduceFunction2 = function(keySKU, countObjVals) {
reducedVal = { count: 0, qty: 0 };

for (var idx = 0; idx < countObjVals.length; idx++) {
reducedVal.count += countObjVals[idx].count;
reducedVal.qty += countObjVals[idx].qty;
}

return reducedVal;
};

var finalizeFunction2 = function (key, reducedVal) {

reducedVal.avg = reducedVal.qty/reducedVal.count;

return reducedVal;

};

db.orders.mapReduce( mapFunction2,
reduceFunction2,
{
out: { merge: "map_reduce_example" },
query: { ord_date:
{ $gt: new Date('01/01/2012') }
},
finalize: finalizeFunction2
}
)


This operation uses the query field to select only thosedocuments withord_date greater than
newDate(01/01/2012). Then it output the results to a collectionmap_reduce_example. If themap_reduce_example
collectionalready exists, the operation will merge the existing contents withthe results of this map-reduce operation.

db.collection.mapReduce() takes the following parameters:

FieldTypeDescription
mapfunctionA JavaScript function that associates or “maps” a
value with akey and emits the
key and value
pair.
See
Requirements for the map Function for more information.
reducefunctionA JavaScript function that “reduces” to a single object all thevalues associated with a particular
key.
See
Requirements for the reduce Function for more information.
optionsdocumentA document that specifies additional parameters todb.collection.mapReduce().
bypassDocumentValidationbooleanOptional. Enables
mapReduce to bypass document validationduring the operation. This lets you insert documents that do notmeet the validation requirements.

New in version 3.2.

The following table describes additional arguments thatdb.collection.mapReduce() can accept.

FieldTypeDescription
outstring or documentSpecifies the location of the result of the map-reduce operation.You can output to a collection, output to a collection with anaction, or output inline. You may output to a collection whenperforming map reduce operations on the primary members
of the set;on
secondary members you may only use the
inline output.
See
out Options for more information.
querydocumentSpecifies the selection criteria using
query operators for determining the documents input to themap function.
sortdocumentSorts the input documents. This option is useful foroptimization. For example, specify the sort key to be the same asthe emit key so that there are fewer reduce operations. The sort keymust be in an existing index for this collection.
limitnumberSpecifies a maximum number of documents for the input into themap function.
finalizefunctionOptional. Follows the reduce method and modifies the output.
See
Requirements for the finalize Function for more information.
scopedocumentSpecifies global variables that are accessible in the
map,reduce and
finalize functions.
jsModebooleanSpecifies whether to convert intermediate data into BSONformat between the execution of the
map and
reducefunctions. Defaults to
false.
If false:

Internally, MongoDB converts the JavaScript objects emittedby the
mapfunction to BSON objects. These BSONobjects are then converted back to JavaScript objects whencalling the
reduce function.
The map-reduce operation places the intermediate BSON objectsin temporary, on-disk storage. This allows the map-reduceoperation to execute over arbitrarily large data sets.
If true:

Internally, the JavaScript objects emitted during
mapfunction remain as JavaScript objects. There is no need toconvert the objects for the
reduce function, whichcan result in faster execution.
You can only use jsMode for result sets with fewer than500,000 distinct
key arguments to the mapper’s
emit()function.
The jsMode defaults to false.
verboseBooleanSpecifies whether to include the timing information in theresult information. The
verbose defaults to
true to includethe timing information.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: