In my earlier post I have explained MongoDB. Mostly the explanation is in terms of administration perspective. In this post, lets see about MongoDB queries.

We will be familiar with SQL query syntax (like SELECT clause, WHERE clause) and write queries in that format would be quite easy. But that will not help when it comes to MongoDB. MongoDB is a document database, not only rows even queries has to be a bson document. I used to see MongoDB queries in three categories,

Find Query Update Query Pipeline Aggregation

Please note these categories are just my view. There are other types of queries (like mapreduce and admin queries) which I have not concentrated, so I have not mentioned above.

When I started learning MongoDB queries, I was just searching how to implement group by clause with combination where and select clause and that is where I ended up with Pipeline Aggregation. After learning aggregation queries later it was easy for me understand “Find” and “Update” queries. So first let me explain about “Pipeline Aggregation”.

Pipeline Aggregation

Pipeline Aggregation is nothing but a pipeline of stages where data will processed through stages. That is output of each stage will be sent to the next stage. The output of last stage will be the final output. Stage can be compared to clause in SQL syntax (like SELECT, WHERE, ORDER BY).

The syntax of pipeline aggregation is, db.getCollection(‘data’).aggregate([<stage_1>, <stage_2>, <stage_3>,…])

You can find the details of the supported stages in the following link. I am gonna explain about some important stages which I use frequently.

Stage Equivalent SQL keyword $project SELECT $match WHERE $group GROUP BY $sort ORDER BY $limit LIMIT $lookup JOIN

Lets understand these stages with examples. Lets say we have collection in MongoDB with the following fields, (Sample dataset can be downloaded from here)

AvgTicketPrice Cancelled Dest DestAirportID DestCityName DestCountry DestLocation DestRegion DestWeather DistanceKilometers DistanceMiles FlightDelay FlightDelayMin FlightDelayType FlightNum FlightTimeHour FlightTimeMin Origin OriginAirportID OriginCityName OriginCountry OriginLocation OriginRegion OriginWeather dayOfWeek hour_of_day

Now we need results for the following scenarios,

Scenario 1 : Fetch only records where hour_of_day is 15.

We need to filter the record where hour_of_day is 15. In SQL, WHERE clause will be used to achieve this. As mentioned in the table, the equivalent stage is $match. So the query will be as given below,

db.data.aggregate([{“$match”:{“hour_of_day”:15}}])

Scenario 2 : Get the value of AvgTicketPrice as ticketcost

The equivalent of SELECT in MongoDB is $project. Hence $project is required for this scenario. The query will be,

db.data.aggregate([{“$project”:{“ticketcost”:”$AvgTicketPrice”}}])

Scenario 3 : Get the no of cancelled tickets.

This scenario needs multiple stages. First we have to filter only the cancelled ticket and then we have to get the count by grouping the filtered documents.

So stage 1 will be {“$match”:{“Cancelled”:”TRUE”} and stage 2 will be {“$group”:{“_id”:1,”count”:{“$sum”:1}}}

(Please note “_id”:1 – means GROUP BY ALL in SQL and “count”:{“$sum”:1} – means COUNT(*) in SQL).

So the final query will be,

db.data.aggregate([{“$match”:{“Cancelled”:”TRUE”}},{“$group”:{“_id”:1,”count”:{“$sum”:1}}}])

Like I mentioned earlier the output of each stage will be taken to the next stage. So in this case the output of $match stage will return only the documents where “Cancelled” will be “TRUE”. Only the matched rows will be grouped and result will be obtained.

There will be some scenarios which might help to understand better. If you have queries with any scenarios or Pipeline Aggregation then let me know.

After understanding pipeline Aggregation then I was able to write queries in MongoDB. Later only I started focusing on “Find” and “Update” Queries. If possible I will try to cover “Find” and “Update” Queries in upcoming posts.