MongoDB Aggregation pipelines and Map-Reduce
Aggregation operations process data records and return computed results. Aggregation operations group values from multiple documents together and can perform a variety of operations on the grouped data to return a single result.
MongoDB provides three ways to perform aggregation: the aggregation pipeline, the map-reduce function, and single-purpose aggregation methods.
if you don't know MongoDB read this blog to get more info:
https://all-about-devops.blogspot.com/2021/05/ea-case-study.html
1. Aggregation pipeline: The aggregation pipeline is a framework for data aggregation modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into aggregated results.
2. Map-Reduce function: Map-reduce is a data processing model for condensing large volumes of data into useful aggregated results. To perform this map-reduce operation, MongoDB provides the mapReduce database command. For those keys that have multiple values, MongoDB applies the reduce phase, which collects and condenses the aggregated data.
map is a javascript function that maps a value with a key and emits a key-value pair.
reduce is a javascript function that reduces or groups all the documents having the same key.
3. single-purpose aggregation: Single-purpose pipeline stages provide filters that operate like queries and document transformations that modify the form of the output document.
Let's Take an Example
Suppose we have Digital Account in that we have made 3 types of transactions
1. credit: it means we have added money to the account.
2. home_expenses: it means we have home expenditure.
3. education_expenditure: it means we have education expenditure.
so transaction in one month we have done is 200 so we need to find how much money gone in home_expenses, education_expenses and how much money we credited in the account. so if you try to calculate one by one it is very difficult so MongoDB has given a framework and one function to solve such problems where you need to do aggregation on a large amount of data and need only limited results.
In the above image, we have a database in MongoDB and it consists of records about expenditure and credit. so home and education category showing expenditures and credit showing money that added in account
Now we will apply aggregation pipeline to get category wise results as disscussed above
1. aggregation pipeline
db.MapReduce.aggregate([
{ $group: { _id: "$Category", value: { $sum: "$Total" } } },
])
here $group is a function that groups the data. we doing grouping using the category of the record. $sum is a function that finds the total amount from grouped data and store in a key that is value.
We will run this aggregation command
and that's it you will get your total expenditure and credit category wise
2. Map-Reduce function
Following is the first function in map-reduce it will use category as key and groups the data. emit() is a function that invokes mapper for mapping data it takes two arguments first one is key for grouping and the second one is on which field we need to process.
var mapFunction1 = function() {
emit(this.Category, this.Total);
};
Following is the second function in map-reduce it will use category as key from above function and groups the data. and uses value for applying accumulator such as $sum,$avg.
var reduceFunction1 = function(key, value) {
return Array.sum(value);
};
Following is the query using mapReduce() function it will call both mapper and reducer and gives ouput in third field as {out: "any string name"}
db.MapReduce.mapReduce(
mapFunction1,
reduceFunction1,
{ out: "Job_Completed" }
)
now after completing the job, it will store all the data in the output string so using the following query you can see the same result.
db.Job_Completed.find().sort( { _id: 1 } )
Now we will run this one by one
and that's it you will get your total expenditure and credit category wise
Conclusion:
So In this way, MongoDB Aggregation Framework and MapReduce works But if you observe from the above two solution aggregation pipelines are very much short and easy to implement then the map-reduce function and some times map-reduce get complicated in applying but pipelines have a predefined function so it is easy to implement so based upon your use case you can use this aggregation framework provided by MongoDB.
No comments