The non-existence of ACID consistency - Part 4
In the previous post of this little blog series, we discussed the actual value of strong ACID-like consistency.
In this final post of the series, we will discuss why ACID is usually not what you think it is.
Be aware that “ACID” usually is not what you think it is
If people think about ACID transactions, they typically think in terms of serializability, the strongest transaction isolation level, the ANSI SQL standard defines.
The isolation levels are defined via the types of anomalies they allow to happen 1. Anomalies can be, e.g., dirty reads (one transactions reads the uncommitted changes of another transaction) or non-repeatable reads (a transaction reading the same value twice gets different results because the value has been changed by another transaction between the two reads). For a detailed discussion of isolation levels and anomalies, see, e.g., “A critique of ANSI SQL Isolation Levels” by Hal Berenson et al.
Serializability is the only isolation level that prevents all kinds of potential anomalies. You get the atomicity and isolation behavior exactly as I described it in the previous posts. The price you pay for this desirable behavior is a huge performance penalty. Serializability basically means that concurrent processing of transactions is not possible anymore. You cannot change anything while any other transaction, no matter if reading or writing, is ongoing – or at least you cannot do it without a huge effort. 2
That is why most databases run in a lower isolation level: They need the throughput to master their production workload. They need the required performance boost of massively concurrent transaction processing that only lower isolation levels allow. Peter Bailis et al. listed in “HAT, not CAP: Towards Highly Available Transactions” several popular ACID databases and their default isolation levels as well as the maximum isolation levels they support.
Most of the databases run in quite low isolation levels as default (like read committed) that still allow for several types of anomalies. And about a third of the databases do not even support serializability as maximum isolation level, i.e., some types of anomalies will always be possible with those databases. 3
This does not mean that your data will be corrupt by definition. ACID transactions still have a lot of value and protect you from a lot of harm. They provide higher guarantees than eventual consistency. But be aware that you do not have perfect serialization of transactions as most people think if they hear “ACID”. Sometimes you may still read stale or even spurious data.
You will see temporary inconsistencies and most likely you will base your actions on this temporarily inconsistent data. And if you later look into your database, you will not be able to explain where this data came from because the data written to the database does not show it. The temporary inconsistency is gone and the data written to disk of course does not show it.
While this will occur less often than with eventual consistency because the consistency guarantees are higher, it still can happen. Thus, keep in mind:
“ACID” does not necessarily mean “serializable”.
You may still sometimes see temporary inconsistencies in most production environments even if you use ACID transactions.
I remember when I discussed isolation levels and their effects with a colleague. He intensely listened and suddenly exclaimed: “Now I understand what happened in our project”.
They built an application which went into production. Nothing unusual to that point. But then the client noticed an inconsistency in the data. They accused the development team of having written defective code. The team checked the code, but according to the code, the inconsistency could not have been created by their application. They also tried to reproduce it, but the application always created the correct result.
Thus, the development team concluded it was not their fault and surmised that someone of the client’s people incorrectly changed the data manually. As it sometimes happens in such situations: The discussion went back and forth. Things escalated. Eventually, it escalated up to the board level – a very unfortunate situation.
As both parties neither felt guilty nor had an idea how to fix the issue, the board members decided to engage a specialist from the database vendor to neutrally examine the situation.
After having read the previous parts of this post, the findings of the specialist should not be surprising for you. His conclusion was: “Everything works as designed. The inconsistency was caused by an anomaly that can occur based on the isolation level the database is running in.” The specialist was even able to track the inconsistency back to the two concurrent transactions that caused the anomaly.
Everyone was totally surprised because all of them had serializable transactions in mind while in reality the database was running in a lower isolation level.
The specialist additionally gave a few recommendations how to minimize the risk that this anomaly will strike again and the development team happily implemented them. So, this story had sort of a happy end which I like. Such stories do not always end in such a good way.
Still, the morale of the story is (rephrasing the previous tip):
Be prepared that even with ACID transactions, you will not always get the results you expect.
Depending on the isolation level the database runs in, you may face different kinds of anomalies.
Thus, ACID transactions are great and I strongly recommend to go for ACID transactions unless there is a good reason not to. Still, ACID usually does not mean strong consistency in terms of serializable transactions which is the model most people have in mind if thinking about ACID transactions.
In case of doubt, better ask the database administrators which isolation level they use in production – might save you from some unpleasant surprises.
Quite often, we see a demand for strong (ACID) consistency across multiple data nodes. While this demand is okay if only a single or very few nodes are involved (rule of thumb: 5 or less), it does not scale across multiple nodes. As we have seen in the first post of this series, the overall availability of the system(s) quickly deteriorates with the number of nodes involved. Therefore strong consistency should be avoided across multiple data nodes and eventual (BASE) consistency should be used instead.
The “killer argument” in the typically evolving discussions is that strong consistency is needed “due to business reasons”, that the “users expect it”. We have seen in the second post that this argument is void because strong consistency does not exist outside IT systems. In the “real world”, most things are simply inconsistent or accidentally consistent, a few are eventually consistent, but none exhibit strong consistency. Thus, this argument does not make sense.
In the third post, we then discussed that strong consistency still has a huge value because it allows for much simpler reasoning about state and consistency than eventual consistency does. Thus, strong consistency should not be thrown away without a good reason (as we often have seen in the NoSQL hype).
But also if you use strong consistency, be aware that ACID transactions usually do not mean serializable transactions, the perfect model we usually have in mind when reasoning about strong consistency. As we have seen in this final post of the little blog series, most databases run in weaker isolation levels which means that even with ACID consistency you may sometimes read stale or partially inconsistent data – as with eventual consistency.
To emphasize it once more: This blog series is not the case against strong consistency. As I wrote several times: Strong consistency has a real value. Use it if you can.
It is the case against requiring strong consistency in a knee-jerk fashion. There are places where we should avoid strong consistency, especially across multiple data nodes. Know them and act appropriately.
And always keep in mind: The gift of strong consistency is something that does not exist outside IT and thus cannot be a reason why you “must” implement ACID transactions in all situations.
I will probably discuss the whole topic of consistency models, anomalies and how to handle them in more detail in future blog posts. Still, I hope these posts already contained some useful food for thought.
Alternatively, isolation levels can be defined by the types of read and write locks they require to be implemented. One definition is based on the technical means used for implementation (locks), the other is based on the effects that can be observed (anomalies). In the end, they both describe the same things, just from a different perspective. ↩︎
This behavior becomes obvious if you look at how serializability is defined in terms of locks required: Everything affected by the transaction is completely locked (read and write) for the whole duration of the transaction. Databases like, e.g., Oracle mitigated this problem by switching to snapshot isolation which allows high throughput in conjunction with quite high isolation guarantees (e.g., non-repeatable reads are not possible with snapshot isolation). Still, snapshot isolation allows for write skew anomalies, i.e., constraints between data items can be violated even if the database “guarantees” they never will be violated. For a detailed discussion of snapshot isolation and write skew anomalies see “A critique of ANSI SQL Isolation Levels” by Hal Berenson et al. ↩︎
At least at the time of the writing of the paper, about a third of the databases did not support serializability as maximum isolation level. I have not checked if some of the affected database vendors meanwhile added serializability to the list of isolation levels their database product supports. Still, there is a good chance they did not because in practice serializability is used quite rarely in production systems due to the unavoidable huge performance hit. I.e., the demand for serializability as isolation level is relatively low in practice (unless you would find a way to implement it without the big performance hit). ↩︎