Relational databases are timeless

June 16, 20235 min read

Let's face it. Relational databases feel outdated relative to NoSQL databases. So why is it still widely used, even in modern projects?

The term "relational databases" was first defined in 1970, 20 years before WWW.

Microservices, global clusters, ORM, petabytes of data, gigabytes of RAM, it all was science fiction when the first RDBs were defined.

RDBs have lower performance, reduced availability, and are less suited to work in distribution, which makes them less efficient for:

High velocity
Large volume
Small latency

Also in some cases, the data is not structured enough to fit well in relational tables.

Those tradeoffs are the results of important attributes that NoSQL databases don't have:

Intuitive and expandable modeling that reflects product requirements.
Flexible querying that is less coupled with the schema.
Data integrity and durability.

Let's explain them more deeply!

Data modeling

Table relationships make it easier to expand the data model.

It is similar to one of the reasons microservices are so successful, you can bind the context of the data to a model that makes sense in terms of product requirements.

For example...

Zoo database that contains the following tables:

animals
employees
assignments

This is straightforward! Now let's say you are required to present the team's view. We can add a teams table with minimal changes in other tables. It will support a wide range of queries.

In NoSQL databases, the model will have to support the type of queries you perform. I will explain that more in the next section.

Unstructured or semi-structured data is less intuitive for modeling in RDB's, it requires extra work to structure it into tables correctly. It may cause a loss of information, which makes model migrations harder to perform.

Flexible querying

Most schemas would work for most use cases.

In relational databases, the model is less coupled with the queries. If the data was not relational, you would have to think more about your queries:

Employees that are assigned to feed some animal at noon?
Animals whose their assigned employee is sick today?
What is the average size of animals in the zoo?

You can query them in a very simple relational data model. While in a non-relational database, you need to model the data in a way that supports those queries or involves application code.

For example, in document databases, many-to-many modeling requires you to understand in advance how the data is going to be used and where to put the reference or the index, or even whether you need to de-normalize the data.

I am saying it carefully, to get a better performance you must think about the requirements and make sure the model can be queried efficiently. Horizontal partitioning is an example of such a model difference that can improve/reduce performance depending on the queries you make.

Some types of queries are harder to make in RDBs, such as:

Text search
Time-series queries
Graph queries

Having a good RDB model for such queries is hard to implement. In that case, I would consider a NoSQL database that suits the query type.

Data integrity and durability

Validations, constraints, and ACID prevent poor data.

RDBs achieve ACID by managing more information like locks and journals, which ensures that data mutates expectedly, and fatal errors will not corrupt it. Such information is hard to manage, especially in distribution, which affects concurrency and reduces performance.

This attribute of RDBs gives good confidence when dealing with data that is important to preserve correctly, especially under concurrency.

From the CAP theorem, we know that DBs can't be both available and consistent, that is maybe the largest difference between NoSQL and RDBs. The eventual consistency nature of many NoSQL databases gives better performance and better horizontal scaling. However it can reduce the integrity of data, for example, eventual consistent databases usually depend on timestamps to decide the order of things, so events with higher timestamps are considered to come after events with lower timestamps. This can be false because timestamps are not reliable and prone to slight skews. In a relational database, the order of things is decided by the total order in which transactions start.

One more aspect is what happens when the DB is down for whatever reason. Many NoSQL databases can work in what is called "leaderless" or "multi-leader" replication, in short, it means that you can send write operations to multiple database nodes, and eventually, all nodes would receive that change and be in sync. Relational databases don't operate well in topology because transactions may conflict with each other and fail during the replication after the user commits their changes. But you can have eventual consistent read-only fallback nodes, what's called "single-leader".

Last, but not least, relational databases have a strict structure and constraints which prevent from engineers making mistakes and causing mismatches of schemas. Either the data is aligned, or you fail. Of course, you can also mismatch relational databases, but you have some boundaries. Because of that, it is easier to change the schemas in a DB migration and make sure all the data aligns with the new schema version.

Conclusion

Relational databases are great for many projects -- NoSQL does not replace them but complements them. NoSQL databases can solve issues that are harder to solve with relational databases and vice versa.

Because of the flexible and expectable nature of RDBs, I think that they are great when there is a lot of uncertainty in the problem. NoSQL databases should come when you try to solve VVV challenges.

In my opinion, the old relational databases are a good choice for typical young projects, while the new modern databases require maturity.

However, there are many ways to take your relational database to the next level. I will probably post something about it in the future.

<- Ways to use promises - Part 2 Deadlock from the outer space ->