In this final article of the series, we will have a look at deeper MongoDB operations and build upon our previous articles. This is the fifth and final article of the series on MongoDB database tutorial, so, do check out all of the earlier tutorials.

Here, we will study about using user profiles and morphing models in MongoDB.

Introducing User Profiles

Continuing our collection in our previous articles, We’ve added a user profile page to the site and want to allow users to enter up to 3 of their favorite books.

Right now, user documents looks like:

{ "_id": ObjectId(...), "username": "Azrius", "email": "azrius@example.com", "favorites": "Harry" }

Users and their favorites are strongly related and will be used together often. New document might look like:

{ "_id": ObjectId(...), "username": "Azrius", "email": "azrius@example.com", "favorites": [ "Harry", "Sleeping", "Demons" ] }

We can use an array to hold multiple values. We’d like to add more author information to our potions so users can be more informed about where they get their books from. Right now, book document look like:

{ "_id": ObjectId(...), "name": "Invisibility", ... "vendor": { "name": "Hillary", "phone": 5555555555, "german": true } }

Each book now contains its author information, which means we have author information repeated throughout our documents. This means we’re going to have duplicate author information for each book.

Duplicate data can be hard to keep consistent throughout the database. If 1 book gets updated with new information and the rest don’t, our data is no longer correct.

Referencing Information

Instead of embedding the author information, we can create a author collection and reference the author document in each potion document.

Book document:

{ "_id": ObjectId(...), "name": "Harry", "author_id": "Hillary", ... }



Author document:

{ "_id": "Hillary", "phone": 5555555555, "german": true }

Author names are unique and don’t change.

Inserting referenced documents

MongoDB does not support joins. In MongoDB some data is denormalized, or stored with related data in documents to remove the need for joins. However, in some cases it makes sense to store related information in separate documents, typically in different collections or databases.

The only limitation of manual linking is that these references do not convey the database and collection names. If you have documents in a single collection that relate to documents in more than one collection, we may need to consider using DBRefs.

Querying a referenced document

In order to pull a book document and the author information, we must first query for the book to get the author_id and then query once more to get their author information.

First, query to retrieve book information:

> db.books.find({"name": "Harry"})



Second, query to retrieve author information

> db.authors.find({"_id": "Hillary"})

Features of referenced document

With a single query, we can grab a user’s email and their addresses.

> db.authors.find({},{"email": true, "favorites": true})

Result will be:

{ "_id": ObjectId(...), "email": "azrius@example.com" "favorites": [ "Harry", "Sleeping", "Love" ] }

Atomicity

MongoDB does not support multi-document atomic transactions. However, it does provide atomic operations on a single document. So if a document has hundred fields the update statement will either update all the fields or none, hence maintaining atomicity at the document-level.

The recommended approach to maintain atomicity would be to keep all the related information, which is frequently updated together in a single document using embedded documents. This would make sure that all the updates for a single document are atomic. MongoDB uses a global write lock (per server before 2.2 and per database in 2.2) for all mutating operations. This means that regardless of the implementation details of updates they are atomic from the perspective of clients. The global write lock guarantees that no other client can see a partial update to a single document.

If we update a author email and add a favorite book, but an error occurs in the favorites book of the update, then none of the update will occur.

Adding comments to Books

We’d like to allow users to comment on the books and need to determine the best route.

Now, we need to decide whether we want referenced or embedded documents. There are 3 questions we can put which will help us to decide which one to use:

Question 1 : How Will the Data Be Used?

Data that is frequently used together will benefit from being embedded while data that’s rarely used can afford the cost of referencing.

How often is data used together? Always Sometimes Never Embed Yes Yes Yes Reference No Yes Yes

Clearly, embedding will work most of the times. Whenever we display books, we’ll always want to display comments.

Question 2 : What is the Size of the data?

The size of the data in a relationship has a significant impact on data modeling. It’s crucial to think ahead!

Expected Size? Less than 100 > 100 Thousands Embed Yes Yes No Reference No Yes Yes







Might start to see a decline in read performance when embedded. When the data size is over 100, we can consider referencing.

Question 3 : How often the data changes?

Sometimes embedding data can lead to data duplication. Depending on whether or not the duplicate data changes a lot can factor into our decision making.

Change Frequency Never/Rarely Occasionally Constantly Embed Yes Yes No Reference No Yes Yes

Using embedded, Data duplication is okay if we don’t expect change.

Using referenced, it prevents inconsistencies from duplication but depends on whether or not we want the overhead of managing duplication.

Embedding comments in Books

We can confidently embed comments within books. We know we’ll have less than 100 comments per book, they’re used together often, and the data doesn’t change often.

{ "name": "Angels", ... "comments":[ { // Comments readily available "title": "The best book!", "body": "Lorem ipsum abra cadrbra" }, ... ] }

Referencing users in Comments

We only need the username from each user, and embedding user documents could lead to duplication issues. Let’s reference a user by using his or her unique username.

{ "name": "Angels", ... "comments":[ { "user_id": "Azrius" }, ... ] }

Data modeling guidelines

As a final step, let’s build some guidelines to model our data.

Generally, embedding is the best starting point.

Focus on how your data will be used.

If you nd you need complex references, consider a relational database.

Reference data when you need to access document independently.

Consider referencing when you have large data sizes.

Conclusion

In this final article of the series, we studied deeply MongoDB operations and built upon our previous articles. This was the fifth and final article of the series on MongoDB database tutorial, so, do check out all of the earlier tutorials. I hope you enjoyed the path we walked. Let’s use these concepts to build better apps !