All you didn't know about the CAP theorem
During my first experience with distributed systems, I found out that the CAP theorem is still widely used to describe these systems. I started researching and discovered that all that I knew about the CAP was not actually something useful. I am not a DB master, but I hope that my little investigation into distributed DBs world will be helpful for developers.
The theorem was presented on the Symposium on Principles of Distributed Computing in 2000 by Eric Brewer. In 2002, Seth Gilbert and Nancy Lynch from MIT published a formal proof of Brewer's conjecture, rendering it a theorem. According to Brewer, he just wanted the community to start talking about it, but his words were interpreted as a theorem.
What stands behind the CAP?
The CAP theorem states that in the distributed system you can only choose 2 out of 3 properties:
- Consistency. Every read would get you the most recent write
- Availability. Every node (if not failed) always executes queries (read and writes)
- Partition-tolerance. Even if the connections between nodes are down, the other two promises are kept
Two nodes which don’t have any network partition between them should be available and consistent.
Two nodes which are partitioned from each other and remain available could not be consistent.
Two nodes which don’t have any network partition between them should be available and consistent. The CAP triangle.
The CAP triangle
Perhaps the most popular image about the CAP is this triangle. Its purpose is to classify popular DB systems and confuse juniors.
Popular DB systems
I have chosen three popular DB systems, read documentation and a lot of articles about them, so now I can look at them through the CAP.
Let’s take a look at PostgreSQL
The following points are related to an abstract PostgreSQL distributed DB.
- Master/slave architecture is one of the common solutions
- Slave can be synced with master in async/sync way
- The transaction system uses two-phase commit to ensure consistency
- If a partition occurs you can’t talk to the server (in the basic case), the system is not CAP-available
So, it can’t continue to work in case of network partitioning, but it provides strong consistency and availability. It’s a CA system!
Let’s take a look at MongoDB
The following points are related to an abstract MongoDB distributed DB.
- MongoDB provides strong consistency because it is a single-master system and all writes go to primary by default
- Automatic failover in case of partitioning
- If a partition occurs the system will stop accepting writes until it believes that it can safely complete them
So, it can continue to work in case of network partitioning and it gives up availability. It’s a CP system!
Let’s take a look at Cassandra
Cassandra uses master-master replication scheme, which means that in case of partition, all nodes will continue working.
It means that we have an AP system.
You may think, that it’s so easy and you already know enough… Actually, no.
There are a lot of issues with CAP theorem, which you should take into account.
- CAP uses very narrow and far-from-the-real-world definitions
- Actually, it is the choice only between consistency and availability
- Many systems are neither CAP-consistent nor CAP-available
- Pure AP and CP systems might not behave as expected
What’s wrong with the definitions
Consistency in CAP actually means linearizability (and it’s really hard to reach it). To explain what linearizability is let’s take a look at the following picture:
In the described case, the referee has finished the game, but not every client is aware of this. To make it linearizable, we need to make data sync instantly, so that when the referee finishes the game, every client will get the correct info.
Availability in CAP is defined as “every request received by a non-failing node in the system must result in a [non-error] response” and it’s not restricted by time. There are 2 problems with that. The first challenge is that there is no partially available system in CAP (for example the node can respond to read request, but not to write). And the second one - if your system will respond successfully after 30 minutes, it’s still CAP-available.
Partition tolerance doesn’t care about node fails. Why? The internet gives us two points about that:
- By the definition of availability: ...every node (if not failed) always...
- By the proof of CAP: the proof of CAP used by Gilbert and Lynch relies on having code running on both sides of the partition.
I saw discussions, mentioning that the node fail will be equal to network partition in most cases and this depends on the nodes configuration. I tend to agree with that, but it’s better to remember that CAP wasn’t designed for that case.
AP / CP choice
Partition Tolerance basically means that you’re communicating over an asynchronous network that may delay or drop messages. The internet and all of our data centers have this property, so you don’t really have a choice in this matter.
Many systems are only P
In case you have one master and one slave, and you are partitioned from the master - you can’t write, but you can read. It’s not CAP-available.
Ok, it’s a CP system, but synchronization between slave and master might be async and there might be a gap between sync request and client request, so you do not have CAP-consistency.
Pure AP and CP problems
Pure AP system may just return any random value or respond really slow and it would be an AP system.
Pure CP doesn’t behave as expected because partitioning in CAP has no fixed duration, we could just never get a response. That system provides only eventual consistency, which is not the strong one that we’d like to have.
How to live with it
After all, it’s just an attempt to classify something abstract, so you don’t have to reinvent the wheel. I recommend to use the following tips when you’re trying to work with distributed DBs:
- Remember about CAP definitions and restrictions
- Use PACELC(A) theorem instead of CAP, it provides additional consistency/latency tradeoff
- Describe how ACID/BASE principles apply to your system
- Decide if the system suits your needs, considering the project you are working on
Let’s take a look at PACELC
The PACELC theorem was first described and formalized by Daniel J. Abadi from Yale University in 2012. As PACELC theorem is based on CAP, it also uses CAP definitions.
IF there is a partition (P), we look at how the system trades off availability and consistency (A and C) ELSE (E) When the system is running normally in the absence of partitions, how does it trade off latency (L) and consistency (C)?
Latency, in this case, is a kind of availability extent. Latency is the time the user will receive the response and which is regulated by some kind of the consistency levels.
Let’s take a look at ACID
ACID is a set of database transactions’ properties. Jim Gray defined these properties of a reliable transaction system in the late 1970s and developed technologies to achieve them automatically. Long time ago database vendors introduced 2 phase commit for providing ACID across multiple database instances. Here is what stands behind ACID:
- Atomicity. All of the operations in the transaction will complete, or none will.
- Consistency. The database will be in a consistent state when the transaction begins and ends.
- Isolation. The transaction will behave as if it is the only operation being performed upon the database.
- Durability. Once a transaction has been committed, it will remain such, even in case of a power loss, crashes, or errors.
Some of the ACID / CAP definitions are similar and are widely used in the articles and docs. Here are the main points about it.
|CONSISTENCY||Data integrity||Linearizability (consistency model)|
|ISOLATION||Transaction behaves as the only operation||Not used, but it is about consistency model|
|AVAILABILITY||At least one should respond||All nodes should respond|
Let’s take a look at BASE
Eventually consistent services are often classified as providing BASE semantics, in contrast to traditional ACID guarantees. One of the earliest definitions of eventual consistency comes from 1988.
BASE essentially embraces the fact that true consistency cannot be achieved in the real world, and as such cannot be modelled in highly scalable distributed systems. Here is what stands behind BASE:
- Basic Availability. There will be a response to any request, but, that response could still be a “failure” or the data may be in an inconsistent or changing state.
- Soft-state. State of the system could change over time due to “eventual consistency” changes.
- Eventual consistency. States that the system will eventually become consistent, the system will continue to receive input and is not checking the consistency of every transaction before it moves onto the next one.
I have been asked a couple of times about which one is better - ACID or BASE, so just to make it clear – it depends on your project. For example, if your data is not critical and the user really cares about the latency, the BASE would be the best option, otherwise, the ACID will help you to make the system as reliable as possible.
Now you know the CAP theorem, its definition, and potential problems. Let’s try to take a look at the same popular database systems using our new knowledge.
Fresh look at PostgreSQL
PostgreSQL allows multiple cluster configuration, so it’s really hard to describe all of them. Let’s just take the master - slave replication with the Slony implementation.
- The system works according to ACID (there are a couple of problems with two-phase commit, but mostly it’s reliable)
- In case of partition Slony will try to proceed with a switchover and if everything goes well, we have our new master with its consistency
- When there is no partition the Slony gives up latency and does everything to approach strong consistency. Actually, ACID is the reason for high latency
The system is considered PC/EC(A)
Fresh look at MongoDB
Let’s find out something new about MongoDB:
- It’s ACID in a limited sense at the document level
- In case of a distributed system - it’s all about that BASE
- In case of no partition, the system guarantees reads and writes to be consistent
- If the master node is failed or partitioned from the rest of the system, some data will not be replicated. System elects a new master to remain available for reads and writes. (New master and old master are inconsistent)
The system is considered PA/EC(A), as most of the nodes remain CAP-available in case of partition. Beware, that in CAP theorem, the system is usually considered a CP one. PACELC inventor, Daniel J. Abadi, says that there are much more problems with consistency than with availability.
Fresh look at Cassandra
- Designed for low-latency interactions
- Cassandra is ACID in at the document level
- In case of distributed system - it’s all about that BASE
- If a node is down or partitioned, it will do the job with the rest of the nodes
- In case of no partitions – the system uses consistency levels to reduce latency
The system is considered PA/EL(A).
In a nutshell
To sum up, I’d like to outline the key points of the article:
- The distributed systems might and should be understood with the help of metrics and terms and the starting point is to understand their tradeoffs.
- It’s really difficult to classify an abstract system. You need to understand what you want to achieve in your project, then it will be easy to figure out how.
- In addition to the previous point, nowadays DB systems are so powerful that they partially provide all of the 3 CAP properties and you just need to configure them right.
- Do not get overwhelmed by this work. We are just curious developers and when we doubt something – it’s better to go to the DB expert.