Introduction to MongoDB

MongoDB is a document oriented database and is one of the leaders of the NoSQL revolution.

To summarize what a NoSQL database is, we can say that it’s a structure that enables us to consume data of varying structure (structured, semi-structured and unstructured), of varying schema and data type. They enable us to respond quickly to user demand through scalability and flexibility.

So what about MongoDB? It’s become somewhat of a leader in the NoSQL space and has huge community support. It follows a structure that uses: databases, collections and documents rather than the traditional: database, table, rows (tuples) framework we see in relational database systems.

A collection would relate to a table and documents would relate to rows. Remember, in these collections, we do not enforce a schema. That means, all the documents held within can have different fields. However, all documents within a collection will be intended for a similar purpose.

Documents are stored within MongoDB in binary JSON format as key-value pairs, which leads to fast data lookup times when looking up on a particular key. We can further improve performance by indexing on multiple attributes.

JSON, nested structures and arrays are supported by MongoDB. The same structure in RDBMS would require multiple tables with foreign key relationships. Without these joins, MongoDB makes querying much faster.

So, why is Mongo leading the charge for NoSQL? First off, it provides very high performance as we are able to load balance across primary and replica nodes. Generally, all write actions will be made against the primary while read activities will hit the replica nodes. This ensures that the server is not congested by an influx of concurrent requests. Should we find that we’re still running into capacity issues, MongoDB is very easy to scale out to counter this.

MongoDB also offers high availability through replication. As mentioned above, we have one write database and several replicas for read requests. In order to setup high availability with MongoDB, we require a minimum of three servers for resilience: primary, secondary and arbiter. The arbiter doesn’t store any data, it makes the decision of which server should be the new primary in the event of a failure.

MongoDB offers automatic partitioning / sharding. Sharding is the process of breaking large data sets into smaller, more manageable chunks (called data shards). This enables us to adopt a parallelized processing model.

MongoDB does not have immediate consistency & only offers eventual consistency, which makes it less suited to real-time read and write activities.

Share the Post:

Related Posts