Being a software developer isn’t just about making a system and getting it running. Imagine you’ve developed an application without knowing how big its user base could grow. You tried to make its performance exceed expectations by using an in-memory database like Redis. You are very happy with the response and money you’re making because of your hard work. Now imagine the user base starts to explode and your current database setup isn’t ready for that. You don’t want to lose your current users but you’re not even able to handle them. Yes I am talking about a social media giant named Instagram.
Instagram in its initial stage used Redis as its database. Redis is a fast in-memory key-value store. It worked really well when the Instagram’s user base was small. But as the Instagram’s popularity exploded, it saw limitations. We like, comment, post media on Instagram every second. That basically means we are talking of not less than a million writes on the database per second. It wasn’t easy to manage this volume of writes using Redis which is a pure in-memory database(meaning your data handling would always be limited according to your RAM).
Eventually, Instagram made a significant switch to Cassandra, a distributed NoSql database. End of story.
I want you to understand how important it is to choose the right tools in your software development journey. Ignoring it today will make you suffer tomorrow.
Before we move on to “why this, why not that?”, let me walk you through the introduction of “this” and “that”.
Relational Database:
We all know these databases organize data in table format where each table is a collection of rows and each row is a collection of columns. These databases in their early stages were used to track sales or process bank transactions. A developer interacted with the data through SQL. To give you an example, PostgreSQL is one of the most widely used relational database.
NoSQL:
A twitter hashtag that was supposed to be catchy ended up giving name to another type of database. This system gained a lot of popularity because of various reasons some of which were:
- Scales better than relational databases.
- Higher write throughput
- Queries on unstructured data was relatively easier as compared to relational database.
Different systems have different needs, and no single database can cover them all. Some need high availability(just like Instagram) whereas some need high consistency(just like Banks). As a result, organizations today use multiple databases that include both relational and non-relational databases. This approach is termed as Polyglot Persistence.
Next time you listen a system using multiple DBs, consider it a polyglot persistent system.
The above picture resembles a resume structure. As you can see in order to store the profile of the person mentioned we have multiple tables. Lets say you want to fetch the “end_year” of the above person’s education.
Your query would somewhat look like:
SELECT e.end_year FROM education_table e JOIN user_table u ON e.user_id = u.user_id;
Now imagine, if there were more tables inside this and you had to fetch something very granular you would have ended up writing an essay of joins.
{
"user_id": 251,
"first_name": "Bill",
"last_name": "Gates",
"summary": "Co-chair of the Bill & Melinda Gates... Active blogger.",
"region_id": "us:91",
"industry_id": 131,
"photo_url": "/p/7/000/253/05b/308dd6e.jpg",
"positions": [
{
"job_title": "Co-chair",
"organization": "Bill & Melinda Gates Foundation"
},
{
"job_title": "Co-founder, Chairman",
"organization": "Microsoft"
}
],
"education": [
{
"school_name": "Harvard University",
"start": 1973,
"end": 1975
},
{
"school_name": "Lakeside School, Seattle",
"start": null,
"end": null
}
],
"contact_info": {
"blog": "[http://thegatesnotes ](http://thegatesnotes /)",
"twitter": "http://twitter /BillGates"
}
}
Above is a json representation of the same resume. Now, if you want to fetch the end year from the education your input would look something like:
db.collection_name.find( { "user_id": 251 }, { "education.end": 1, "_id": 0 } )
Above NoSQL query has been taken from chat-gpt as I haven’t worked quite a lot on NoSQL.
If we consider the above case, we can see that the JSON representation has better readability than the traditional schema. If you want to fetch a profile in the realtional example, you need to perform multiple queries or do a “JOIN” dance between the user table and its subordinate tables.
Below is the json representation of the above One-to-Many relation(user and his data).
There are a lot of reasons of using one database over other but there is no way one can say NoSQL is always better or SQL always sucks. It totally depends on the use cases of both of them and hence most organizations use them together.
That’s it from my side. If you want to read more about it I’d recommend you a book: Designing Data Intensive Applications by Martin Kleppmann. It’s a gem of a book.
If you liked this post, please consider liking it and you can also share and follow me for more such content:).