Deadlock from the outer space

February 1, 20242 min read

A deadlock is a situation where tasks are in progress but can't progress. It can occur when they are mutually dependent on each other (directly or indirectly).

Relational databases ensure consistency and prevent fatal errors by using locks.

For example, there are two lock "spaces" in MySQL: the metadata space and the engine space.

Each one is responsible for different kinds of problems. For instance, data must not change during a schema change. The solution is a metadata lock that prevents any operation on the table while changing the schema.

For me, most deadlocks I encountered were between two DML operations. But in the example presented next, it comes from a combination of DDL and DML operations.

Let's dig in!

Setup 🏗️

docker run --name deadlock-demo-mysql -e MYSQL_ROOT_PASSWORD=1 -d -p 3306:3306 mysql:8
mycli --host localhost --port 3306 --user root

CREATE DATABASE deadlock_demo
connect deadlock_demo
CREATE TABLE table1 (partition_id INT, id INT, some_value INT);
ALTER TABLE table1 ADD PRIMARY KEY (id, partition_id);
ALTER TABLE table1 PARTITION BY LIST COLUMNS (partition_id) (
    PARTITION p0 VALUES IN (NULL),
    PARTITION `1` VALUES IN (1)
);

The deadlock! 😵

-- session1
START TRANSACTION;
SELECT * FROM table1 WHERE partition_id=1;
-- session2
ALTER TABLE table1 ADD PARTITION (PARTITION `2` values in (2));
-- session1
INSERT INTO table1 VALUES (1, 1, 1);

-- Boom.

But why? 🤨

Let's do it again, but this time we will watch the locks with the following query:

SELECT * FROM performance_schema.metadata_locks;

This is what we see...

Session #1 holds SHARED_READ

1 session

Session #2 holds SHARED_UPGRADABLE and is pending for EXCLUSIVE (which is held by session #1)

2 sessions

Then, session #1 requests SHARED_WRITE which is held by session #2

deadlock

Boom. Again.

You can read more about the different metadata locks here.

Note that you will not see the deadlock with SHOW INNODB STATUS like a regular DML operations deadlock, as it comes from the metadata space.

What can you do?

There is no magic solution. Locks are necessary and unavoidable. However, there are some things you should consider to reduce the blast:

Keep a small amount of concurrent DB transactions.
Keep transactions short.
Avoid doing many DDL operations. Such locks have the highest priority.
Handle retries smartly, use exponential back-off and jitter. (and avoid unprocessable retries)
Monitor your DB operations.

Conclusion

This issue was a good way of learning more deeply about RDS locks and issues that can occur under high load. Personally, it was a very interesting scenario to research it!

Locks are a key part of having high consistency but can cause unusual problems, and there are actions you can take to avoid deadlocks and keep your production happy.

<- Relational databases are timeless Balanced reusable components ->