Subscribe via RSS Feed

Map Reduce Programming – Concise Definition

June 10, 2013 2 Comments

map reduce

Above is a visual representation of map – reduce framework.

Map-reduce is a programming technique to solve any aggregation-grouping-summation related problem, where we have huge amount of data and try to do the above operation in parallel in no of distributed machines.

In map function/s, we will collect the frequency of data after processing on some subset of data.

In reduce function/s, we will merge and aggregate the intermediate processed data from map function/s and can apply any other statistical formula on the result to make final output.

So the final result from map-reduce is a form of summery of raw data in repository before processing the data with map-reduce framework.

We should understand the scenarios where the map-reduce operation can/can not be applied.

1> Map-Reduce is a batch operation, so it should not be applied to any on-line scenarios.

2> Map-reduce can not be applied to recursive problems, because in recursion upper cycle is dependent on inner cycle of the problem which is not the case for map-reduce

3> If the data size is low i.e. it can be processed in single machine, we should apply it. Here map-reduce programming within cluster will be an over-engineering.

4> We can use map-reduce in frequecy calculation on unique obejects and apply any sort of aggregation function on that frequecy to form a new dimension of the information.

Reader should comment for the scenarios where map-reduce should be/not to be applied.

We will describe some map-reduce scenario in our next articles.

An interesting reference to understand map-reduce framework -

How I explained MapReduce to my Wife?

Enter your email address:

Delivered by FeedBurner