InnoDB isolation levels

Warning: This blogpost has been posted over two years ago. That is a long time in development-world! The story here may not be relevant, complete or secure. Code might not be complete or obsoleted, and even my current vision might have (completely) changed on the subject. So please do read further, but use it with caution.
Posted on 20 Dec 2010
Tagged with: [ innodb ]  [ MySQL ]  [ transaction

When asking what THE advantage of InnoDB over other MySQL engines like MyISAM is, then 9 out of 10 times the answer will be that InnoDB supports transactions. And it’s true. But there is more about transactions than meets the eye. Let’s explore one of the most difficult area’s: isolation levels.

ACID again

First, some small theory on RDBMSes. InnoDB is ACID compliant system. This means that InnoDB is:

1. Atomic
A transaction or group of statements are either all executed successfully, or all will fail. It is not possible that a query succeeds, while the next one in the transaction fails. This is an important aspect when you are in the process of transferring money from one account number to another (you don’t want to loose the money in case the database server fails).

2. Consistent
The database is consistent before and after a transaction. This means that for instance foreign key relations are valid before and after a transaction. Allthough the database CAN be in a inconsistent state during the transaction, it’s ok as long as at the end of the transaction the database is made consistent again.

3. Isolated
Transaction A cannot interfere with transaction B. This means all the data we update in transaction A is not visible in transaction B until transaction A actually commits the data. This I believe is the hardest property for databases because we need to find a balance between (true) isolation and performance. We talk about this during this post.

4. Durable
Data in the database should be permanently stored. Or, better said: when the database tells the client (you) the data has been stored or committed, the data should actually have been stored. This however, is not the same as the data being stored on the disk, but for this blog post you can consider it as being saved to disk.

We will talk in the post about the 3rd property: isolation. It looks like a simple one: I do something, somebody else does another thing and we both are completely seperated from eachother. Unfortunally, this is not the case in databases. We share the same data, we might update the same data at the same time and this can result in conflicts, just like in a code repository. We’ll discuss what can happen, what the catches are and how InnoDB (and ultimally you) can avoid it..

InnoDB and isolation

Let’s talk a bit in detail about the isolation-property of InnoDB. As said: transactions are completely isolated from each other. Suppose you have a database with 2 concurrent connections serving transaction A (TX A) and transaction B (TX B), and field_a in table_a holds the value ‘10’:

TX A: start transaction
TX B: start transaction
TX A: update table_a set field_a=field_a+10;
TX B: select field_a from table_a;

Now, what kind of data should transaction B see? Should it see field_a with the 10-values, the 20-values, or should transaction B block until transaction A is done? Or even something else? The correct answer is that it depends on what you need, and MySQL lets you decide for yourself. This is called the “transaction isolation level” which you can set in MySQL on a global, session or only for the current transaction.

To see the current transacton levels issue the following sql query:

SELECT @@GLOBAL.tx_isolation, @@tx_isolation;

It will display the global transaction level and the current transaction level that is used. You can only set the global transaction level when you have SUPER privileges.

To set the level:

set [global|session] transaction isolation level [level];

In MySQL/InnoDB there are 4 kind of isolation levels ranging from the lowest to the highest: read uncommitted, read committed, repeatable reads and serializable. The higher the level, the more performance loss you will get, but the more “isolated” the transactions will be. If your application has no problem in seeing data from other transactions in some cases, it can be perfectly allright to lower the isolation levels so you gain a bit of performance and speed. Let’s discuss all levels in detail:

read uncommitted

Transaction B will see all the mutations that transaction A makes. You might say that there is no real isolation at all and this comes with all kind of problems that are solved by the other levels:

First of all, even though you see the mutations from transaction A, it is still uncomitted data. This means that when transaction A does a rollback, all data will be restored to what it was just like before starting transaction A. So in transaction B you can see those records change data during a transaction. That is called a “dirty read”.

Secondly, when transaction A changes data and commits that data, transaction B still sees that data as changed during it transaction. Basically: it gets different results with the same query, which is called a non-repeatable read and allthough this is similar as a dirty read, we categorize it differently.

And finally, it is possible that transaction A inserts new rows in its transaction. Those records are called phantom rows and off course can dissappear when the transaction does a rollback.

read committed

Basically this level will take away possibility of  dirty reads. It is not possible to see the data from transaction A until it is commited. Note that  transaction B won’t see the data until it has committed it’s own data. Non-repeatable reads and phantoms are still possible since this level does not address those problems.

repeatable read

We take away the non-repeatable reads so from this level on we don’t see ANY data from other transactions until we are actually done with our transaction. Now, depending on the database system you use, phantom reads could still possible, but this is not the case for InnoDB.

serializable

So, since the “repeatable read” level takes care of all three problems, why do we have a fourth level? Basically this mode can be described as a “paranoid” mode since it will lock all records that a transaction select for all the other transactions. It means that when a transaction updates a record, it becomes impossible for others to select that record. It’s a great way to make absolutely sure that no transaction overwrites the actions of other transactions, but it comes with the costs of lots of record locking.

Some SQL examples

Let’s try some examples with 2 transactions and see how they react depending on the transaction levels. Assume that ‘test’ is a table with the following data:

+----+-----+
| id | val |
+----+-----+
|  1 |   8 |
|  2 |   8 |
+----+-----+

read uncommited

TX A: start transaction;
TX B: set session transaction isolation level read uncommitted;
TX B: start transaction;
TX A: select * from test;                   -- val = 8
TX B: select * from test;                   -- val = 8
TX A: update test set val = val + 1;        -- val = 9
TX B: select * from test;                   -- val = 9, dirty read
TX A: rollback;
TX B: select * from test;                   -- val = 8
TX B: commit;

As you can see it’s possible for TX B to see the data that was modified by TX A. However, after the rollback of TX A, the data is reverted back.

read committed

TX A: start transaction;
TX B: set session transaction isolation level read committed;
TX B: start transaction;
TX A: select * from test;                   -- val = 8
TX B: select * from test;                   -- val = 8
TX A: update test set val = val + 1;        -- val = 9
TX B: select * from test;                   -- val = 8, No dirty read!
TX A: commit
TX B: select * from test;                   -- val = 9, commited read

This level shows the dirty read is not possible, but after the commit of TX A, the data is available to TX B.

Repeatable read

TX A: start transaction;
TX B: set session transaction isolation level repeatable read;
TX B: start transaction;
TX A: select * from test;                   -- val = 8
TX B: select * from test;                   -- val = 8
TX A: update test set val = val + 1;        -- val = 9
TX B: select * from test;                   -- val = 8
TX A: commit
TX B: select * from test;                   -- val = 8, repeatable read!
TX B: commit;
TX B: select * from test;                   -- val = 9 (from tx A)

After the commit of TX A we still see the unchanged data. Only after a commit (or rollback) from TX B we see that the data has changed.

serializable

TX A: start transaction;
TX B: set session transaction isolation level serializable;
TX B: start transaction;
TX A: select * from test;               -- val = 8
TX A: update test set val = val + 1;    -- val = 9
TX B: select * from test;               -- LOCKED, NO OUTPUT
TX A: commit;                           -- Unlocked TX B
TX B: select * from test;               -- val = 8 (repeatable read!)
TX B: commit;
TX B: select * from test;               -- val = 9 (now we see TX A)

You see that after we have done an update we cannot select the data in the other transaction anymore. We have to wait until TX A commits or rollbacks the transaction. After TX A has committed it’s data, TX B still can’t see it because otherwise it would be a non-repeatable read. We have to commit our own transaction before we can see the actual results.

Conclusion

Isolation levels are an aspect that for normal engineers or administrators is not really that important. The standard “repeatable read” level that InnoDB uses is good enough for maybe 99% of the databases outthere. However, loosening the levels can increase your database performance but you can loose some safety. It’s up to you to decide if repeatable-reads or phantoms are an issue or not…