Today in this article, we will learn MongoDB Schema Design – Guidelines and Best Practices.
There is a truly basic understanding of how RDBMS (Relational databases) works by storing information in tables and with a defined relationship between tables.
There are many advantages offered by the RDBMS system like ACID compliance, Data accuracy, Data Normalization, and easy-to-use tools readily available.
However, over the decade, many drawbacks have been discovered based on its usage like impediments to the speed to market, scalability issues (due to rigid schema), and performance issues due to the amount of data each table carries.
Today, we will cover below aspects,
NoSQL for the rescue?
MongoDB makes it possible to break free from the rigidity and limitations of the relational model ie. RDBMS.
Embedding documents and using child and parent references to store related data makes NoSQL a bit faster compared to other Db systems.
The relationship is built by normalizing the data into multiple documents if needed to remove redundancy and any data can be fetched according to the query we provide i.e. using join.
MongoDB Schema Modelling
Schema modeling is important.
It’s important to come up with a proper schema before you could perform data storage in your enterprise database system.
Please note once a schema is designed and approved to use for MongoDB, a change in schema should be the last thing you should do.
Any change in the schema has a high chance of impacting any upstream or downstream access of data.
A lot more Application Development teams prefer type-safe implementation for their Data access components or backend API, hence changing the schema will bring a lot of effort to manage changes in all upstream or downstream applications (applications that directly or indirectly depend on the data.).
Design your schema with extensibility to take care of any extension to the schema.
MongoDB data quality and integrity
Data Quality aspects are bigger and not only limited to fields and their data type used in the MongoDB documents.
However, once you finalized the data quality aspects and are ready to code or design the schema, make sure to define the schema data type correctly as required by fields.
Document quality depends upon resource naming and the context in which it’s defined.
Example: Representing the string fields with int could become problematic later. However, keeping string fields does help most time which avoids any future issues as well.
Careful selection on DataType in MongoDB
One should take at most care in defining the datatypes for the Mongo Schema field.
For example,
Double Vs Int
If an amount is defined as double 12. 31 in the source, it could become 12 if defined as Int in the MongoDB affecting the data.
String Vs Int
if an account number is defined as 0001231 in the source, it could become 1231 if defined as Int in the MongoDB affecting the data.
If not sure about the data type, string type always helps.
Depending on the source of data, use case if the data needs to retain the same as the source of truth, data type choice should be done carefully.
Properties in the schema can be designed with length in mind
Impact Analysis – UpStreams and Downstream application
Depending on the design of the application, there will be a direct or indirect impact on the upstream and downstream applications using the schema.
Most application uses type-safe implementation for their component which means any change in the schema will have a direct impact on those applications’ code base and hence applications.
Such impact can be reduced by minimizing the schema changes as much as possible i.e. designing the schema properly.
Schema change should be the last thing you should do only if other alternatives don’t work out.
Design your schema with the Open-Closed loop principle i.e. schema can be extended for new field/attribute but changes are closed on existing fields/attributes.
MongoDB Data Modeling and indexing
This is an important step towards performance optimization. Data modeling according to query patterns delivers more efficient queries and spreads your workload across a shared cluster.
To increase the throughput of query careful consideration of how to store data is very crucial.
Storing Data Together for Effective Access, deciding on when to embed a document vs reference separate documents in different collections.
Indexing supports the efficient execution of queries and controls the scan of every document efficiently. Please note without indexes every document will do a full scan in a collection and increase query performance.
Please visit for more details: MongoDB Indexing Guidelines and Best Practices
Storing Data Together for Effective Access
If your document size is less and you are very confident that document size will remain under the limitation( 16 MB max documents size) then it’s a good idea to hold the data together as much as possible.
If any data is probably growing beyond the capacity of 16 MB then indeed you need to make sure you keep data in separate documents.
Embedding vs. Referencing in MongoDB
For MongoDB data modeling, how to store the data and create Indexing the fields to provide good query performance are key aspects.
Using Embedding in MongoDB, you create a schema with Embedded documents where documents are nested inside another document, also called nested documents within the collection.
In the referencing documents, you move some parts of the documents into the new collection and give the reference of documents in the parent collection.
You mainly use the $lookup operator which is similar to the JOIN operator.
Advantages of Embedding
- Faster Data retrieval.
- Require a single or few queries to get the data from a single collection.
- Effective in fetching small documents.
- Improves the overall performance of the application.
Disadvantages of Embedding
- Not suitable for a large amount of embedded data
- MongoDB supports only 16 GB of the max document size.
- Large document fetch may cause performance overhead.
Advantages of Referencing
- Help in reducing the size of Master documents.
- Let you fetch even more than 16 MB documents by referencing it.
- A suitable approach for documents of size more than 16 MB.
Limitations of Referencing
- More than one query or $lookup operator will be needed to fetch the data from reference collection, which will be slower compared to embedding documents.
Domain Driven Design approach for schema
You could leverage domain driven design approach for building the schema and henceforth storing the data in your Database system.
Most enterprises use Microservices architecture where microservices own their own data and decouple themselves from other systems.
Modern NoSQL databases are designed for cloud readiness, making them inherently suitable for horizontal scaling where smaller servers can be spun up to handle increased or any ad-hoc on-demand load.
Instead of keeping all the data in one single large Database, you spun up the data and perform sync-up across the system.
Do you have any comments or ideas or any better suggestions to share?
Please sound off your comments below.
Happy Coding !!
Please bookmark this page and share it with your friends. Please Subscribe to the blog to receive notifications on freshly published(2024) best practices and guidelines for software design and development.