Above is a visual representation of map – reduce framework.
Map-reduce is a programming technique to solve any aggregation-grouping-summation related problem, where we have huge amount of data and try to do the above operation in parallel in no of distributed machines.
In map function/s, we will collect the frequency of data after processing on some subset of data.
In reduce function/s, we will merge and aggregate the intermediate processed data from map function/s and can apply any other statistical formula on the result to make final output.
So the final result from map-reduce is a form of summery of raw data in repository before processing the data with map-reduce framework.
We should understand the scenarios where the map-reduce operation can/can not be applied.
1> Map-Reduce is a batch operation, so it should not be applied to any on-line scenarios.
2> Map-reduce can not be applied to recursive problems, because in recursion upper cycle is dependent on inner cycle of the problem which is not the case for map-reduce
3> If the data size is low i.e. it can be processed in single machine, we should apply it. Here map-reduce programming within cluster will be an over-engineering.
4> We can use map-reduce in frequecy calculation on unique obejects and apply any sort of aggregation function on that frequecy to form a new dimension of the information.
Reader should comment for the scenarios where map-reduce should be/not to be applied.
We will describe some map-reduce scenario in our next articles.
An interesting reference to understand map-reduce framework -