Map-Reduce in MongoDB

In this article we will try to understand the function of mapreduce in mongodb. At first we need to know what is map reduce? Let us see below.

What is map reduce in mongodb?

As per mongodb.org map-reduce in mongodb is a data processing paradigm for condensing large volumes of data into useful aggregated results. For map-reduce operations, MongoDB provides the mapReduce database command.
So what does it means in simple word?

  • The mapReduce command takes 2 primary inputs, the mapper function and the reducer function.

A Mapper will start off by reading a collection of data and building a Map with only the required fields we wish to process and group them into one array based on the key. And then this key value pair is fed into a Reducer, which will process the values. MongoDB supports map-reduce operations on sharded collections.

Let us see a basic example:

For example sake let us make a dummy collection , which is filled with dummy data. Like below


 {
        name: foo,
        price: 9
    },
    {
        name: foo,
        price: 12
    },
    {
        name: bar,
        price: 8
    },
    {
        name: baz,
        price: 3
    },
    {
        name: baz,
        price: 5
    }
{
        name: baz,
        price: 8
    }

so if we want to sort the prices of matching names. We should use a mapper and a reducer consecutively. Which will give us the below output.


foo	[9,12]
bar	[8]
baz	[3,5,8]

In the above we can see that in the output we are getting the prices and quantity of the product by the same name.

So, we have ssen what map-reduce can do. But how to invoke it?

How to use map reduce in mongodb?

In mongodb map-reduce is invoked by javascript. At first we need to create a javascript function for mapping the documents. In map function we have to emit key.After mapping is complete we need another function for reduce.In reduce function we simply put together the results and sum them up.

Now let us view the codes.

Javascript function for mapping:


var map = function(){
   ......
  emit();
} 

javascript function for reduce:


var reduce = function(key, value){
  return {result1: one, result2: two};
}

we have completed both mapping and reducing. But how to run the query against a collection?


db.collection_name.mapReduce(map, reduce);

so, what do we see in the above? We are using mapreduce funtion which is provided by the mongodb. Before that we are mentioning the collection name. And as parameters we have map at first and reduce at second.

This is the basics for map-reduce. But we can add a twist to the above code. For that suppose we have a collection name test.we are going to map-reduce the documents of that colection into another collection named mapped_collection. So how do we do it?

We just have to specify the output path to the above code snippet. And after modification the code will look like


db.test.mapReduce(map, reduce, { out: "mapped_collection" });

Behaviour:

The map-reduce operation can write results to a collection or return the results inline. If we write map-reduce output to a collection, we can perform subsequent map-reduce operations on the same input collection that merge replace, merge, or reduce new results with previous results.

When returning the results of a map reduce operation inline, the result documents must be within the BSON Document Size limit, which is currently 16 megabytes.

Incremental map-reduce in mongodb:

we are also able to map-reduce documents incrementaly. But to do that we have to keep in mind some things. Let us talk about them in the below.

  • Run a map-reduce job over the current collection and output the result to a separate collection.
  • When we have more data to process, run subsequent map-reduce job with:
  • The query parameter that specifies conditions that match only the new documents.
  • The out parameter that specifies the reduce action to merge the new results into the existing output collection.

Let us use this feature and make a example to understand more.

In this example we are going to make a library. Where we will be able to see the books in two category.

  • Story(<250 pages)
  • Novel(>250 pages)

So, let us begin:

At first we will be in need of data.


 book1 = {name : "2 states", pages : 100}
 book2 = {name : "The Unstable Earth", pages : 200}
 book3 = {name : "Eragon", pages : 300}
 book4 = {name : "Inheritence", pages : 400}

Now, let us save this books in acollection called books.


db.books.save(book1)
db.books.save(book2)
db.books.save(book3)
db.books.save(book4)

Now it is time to use the map function to sort out as per the requirement.


var map = function() {
var category;
if ( this.pages >250 ) 
category = 'Novel';
else 
category = "Story";
emit(category, {name: this.name});
}; 

Now let us invoke the reduce function:


var reduce = function(key, values) {
var sum = 0;
values.forEach(function(doc) {
sum += 1;
});
return {books: sum};
};

Now it is time to use map-reduce on the collection. And see what happens.


var count  = db.books.mapReduce(map, reduce, {out: "book_results"});
db[count.result].find()

If everything is ok then we should experience the following result.


{ "_id" : "Novel", "value" : { "books" : 2 } }
{ "_id" : "Story", "value" : { "books" : 2} } 

It is a basic example where many individuals can bring up the question against using map-reduce. But in case of BLOB data this feature is indeed quite useful. Although in this article we have just discussed map-reduce in mongodb. This feature can also be accessed from node.js,php too.

Related Links:

1> How to Get Started with MongoDB Database?
2> How to Get Started with MongoDB?
3> How to Import and Export Through Mongodb?
4> How to Use Projection in MongoDB?
5> Using sort method in mongodb
6> Introduction to Replication in MongoDB
7> Deploying a Replica Set in MongoDB
8> Discussing Replication Lag in MongoDB
9> Replica Set Members in Mongodb
10> Working with Sharding in MongoDB
11> Working with Index in MongoDB
12> Working with Aggregation in MongoDB
13> How to Work with Aggregation Framework in MongoDB?
14> Working with Pipeline Concept in MongoDB
15> Discussing about Pipeline Expression in MongoDB

If you find this article helpful, you can connect us in Google+ and Twitter.

2 thoughts on “Map-Reduce in MongoDB

Leave a Reply

Your email address will not be published. Required fields are marked *