I would like to use MongoDB as the backend for the analytics system I am building. One of the main advantages of using MongoDB is the built-in map reduce. Since we are at "medium data" scale, we do not yet need the overhead of Hadoop.
For testing purposes I insert 50 million rows of the type
{
 user_id: xxxx,
 thing_id:xxxx,
 time: xxx
}
With an index on user_id on an EC2 Large Instance. Its a Single instance mongodb (not sharded).
db.user_thing_like.find({user_id: 37104857}) 
takes less than a second.
However a mapreduce where I wanted to count the number of user entries took all night and returned with an out of memory error, either I must be doing something stupid or mongo db is not right tool for what I want to do.
I am new to Mongo DB and would appreciate any help. Thanks in advance
ERROR :
Tue Aug  9 13:15:58 uncaught exception: map reduce failed:{
        "assertion" : "invoke failed: JS Error: out of memory nofile_b:2",
        "assertionCode" : 9004,
        "errmsg" : "db assertion failure",
        "ok" : 0
}
MAPREDUCE QUERY:
db.user_thing_like.mapReduce(map, reduce, {out: "tmp_test"}, {query: {"user_id" : 37104857 }});
MAP AND REDUCE:
map = function () {
    for (var key in this) {
        emit(key.user_id, {count: 1});
    }
};
reduce = function (key, emits) {
    total = 0;
    for (var i in emits) {
        total += emits[i].count;
    }
    return {"count": tot开发者_如何学JAVAal};
}
--- UPDATE ---
I realized that the mapreduce was not considering my query filter, in the syntax I used.
Here is the correct mapreduce query.
db.runCommand({mapreduce: "user_thing_like", map: map, reduce: reduce, out: "tmp_test", query: {"user_id" : 37104857 }});
map = function () {
        emit(this.user_id, {count: 1});
    }
};
Also, try to specify user_id as sort key for MapReduce, from the manual:
sort : <sorts the input objects using this key. Useful for optimization, like sorting by the emit key for fewer reduces>]
I realized that the mapreduce was not considering my query filter, in the syntax I used.
Here is the correct mapreduce query.
db.runCommand({mapreduce: "user_thing_like", map: map, reduce: reduce, out: "tmp_test", query: {"user_id" : 37104857 }});
 
         
                                         
                                         
                                         
                                        ![Interactive visualization of a graph in python [closed]](https://www.devze.com/res/2023/04-10/09/92d32fe8c0d22fb96bd6f6e8b7d1f457.gif) 
                                         
                                         
                                         
                                         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论