MongoDB Optimization and Scalability - 1CloudHub: Digital Transformation – Advisory | Solutions

MongoDB is very popular among NoSQL Databases. The JavaScript based MEAN and MERN stack raised its popularity to its peak. As it’s a JSON based Datastore programmers preferred it over other Databases.

The main advantages of MongoDB are Flexible Schema, easy horizontal scalability, powerful querying, and analytics. If we don’t design it correctly, it will affect the performance. Here we have Listed the common mistakes happen in Designing MongoDB and how to avoid it.

Using it like Relational Databases
Schemas are not query first
Over usage of Index and bad indexing sequence
Not designing for application performance

1. Using it like Relational Databases

The key differentiating factor between a NoSQL database and Relational database is Join Queries. In SQL databases, Normalizing the data will result in writing Join Queries. But most of the NoSQL databases doesn’t allow Joins, Because Joins are bad in Distributed Databases as its extremely hard to scale.

For example, when the collection grows in MongoDB you will need to Shard the Collection to scale it, as MongoDB is designed to be set up as a cluster of multiple shards.

The problem is, we cannot join two collections if it is sharded. You can refer that from the documentation here “In the $lookup stage, the from collection cannot be sharded.”

When MongoDB added $lookup stage to aggregation in version 3.2 in 2015, the CTO of MongoDB “Eliot Horowitz” wrote a blog stating that “We’re still concerned that $lookup can be misused to treat MongoDB like a relational database. But instead of limiting its availability, we’re going to help developers know when its use is appropriate, and when it’s an anti-pattern. “

We can see that many developers coming from SQL background adopting MongoDB. Some of them don’t know how to Denormalize the data in NoSQL database. This is the Major Reason for the usage of $lookups.

You can use embedded documents and arrays to include relational data in a single document structure instead of normalizing across multiple documents and collections.

2. Schemas are not query first

When It comes to NoSQL Database, Your Database Schema Design should always be based on Query-first design.

First you must think in terms of how you want to retrieve the data. And you should store redundant data in other collections, if the data is not changing often and you need that data for retrieval.

For example:

Users Collection:

{

_id: 1,

name: ‘John’,

address: ‘chennai’,

}

Orders Collection:

{

_id:1,

userId: 1,

name: ‘John’,

item: ‘cell phone’,

price: 100,

}

In the above example we have the name field in both users collection and orders collection. This is because we need to show the name of the user who made that order. So, when you insert the data in orders collection, you are adding the field called name.

MongoDB recommends nested document over relational data arrangement. This helps us in sharding.

But At times you can store relational data in different collections if it’s changing frequently, but you will run multiple queries to retrieve the data.

3. Over usage of Index and bad indexing sequence

Generally, indexes will affect the database writes. You should be very selective when creating indexes.

Always run the query and analyze it with query executionStats and make sure that it is using correct index.

You can enable the mongodb profiler by running the command

db.setProfilingLevel(2,0)

Then

sudo tail –f /var/log/mongodb/mongo.log

The above command will show the exectionStats of the current queries.

If the winning plan is using “collscan” -> it will go through all the documents in the collection.

If it is “Idxscan” -> it will just examine the Index.

You should always keep one thing in mind, “The Number of records returned should always be closer to the number of docs examined”. If you keep this balance, your query will perform well.

Create descending order index for created_at field if you always want to retrieve the latest documents.

Use Compound Index if your query needs it. The order of fields in compound index is very important.

Give explicit hints to force the query to use the correct index.

4. Not designing for application performance

Simple Queries are always good for OLTP. We should avoid complex aggregation in our application. If our Application needs to retrieve data in certain way then it’s good to store the data in that way.

Avoid “group by”, “sum”, conditional grouping etc…

Try to use simple find Queries.

The DB is optimised and ready for scaling if you follow the above four steps. Reach out to us for further inquiries.

Written by ;

Frank D

and

Umashankar N

Frank David

Software Engineer

1CloudHub

In Blog

by Frank David