Friday, July 8, 2011
If you want performance you need to partition data. If you partition data you will not get performance! That sounded clever but is it true? Well, it can be if the architecture of your application is naïve.
The problem I am describing here is when there is a need to partition data across multiple geographical regions. Partitioning data essentially spreads data among several servers resulting in fast accesses. But when the data is spread across large geographical distances then this will result in significant network latencies. This is something that cannot be avoided.
Memcached is a common technique to store commonly read data into in-memory caches preventing frequent dips to the database. Memcached accesses data through “gets” and updates data through “sets”. Data is accessed based on a key which is hashed to one of several participating servers. Thus memcached distributes the data among several participating servers in a server list. Reads and writes are of the order of O(1) and extremely fast. This works fine as long as servers belong to a single region or if the data center is in the same region. The network latencies will be low and the latency of the application will not be severely affected.
Now consider a situation where the memcached servers have to be distributed to multi-region data centers. While this is an excellent scheme for Disaster Recovery (DR) it introduces its own set of attendant problems.
Memcached will hash the entire data set and distribute it over the entire server list. Now “gets” of data from one geographical region to another will have significant latency. Since the laws of physics mandate that nothing can exceed the speed of light, we will be stuck with appreciable latencies for inter-region reads and writes. So while a multi-region deployment provides for geographical resiliency it does introduce issues of latency and degraded throughput.
So what is the solution? One possible solution is to replicate the data across the regions. The solution to this problem is to replicate data in all the regions. One technique that I can think of is to have the application to implement “local reads & global writes”. This technique provides for the AP part of the CAP Theorem. The CAP theorem states that it is impossible to completely provide Consistency, Availability and Partition tolerance to distributed application. The “local reads & global writes” method will assure availability and partition tolerance while providing for eventual consistency.
In this technique, updates are done both on local servers along with asynchronous writes to all data centers. The writes are hence global in nature. The updates will not wait for all writes to complete before moving along. However reads will be local ensuring that the latency is low. Data reads based on data proximity will ensure that latency is really low.
Since writes are asynchronous the data will tend to be “eventually consistent” rather than being “strongly consistent” but this is a tradeoff that can be taken into account. Ideally it will be essential to implement the quorum protocol along with the “local reads & global writes” technique to ensure that you read your writes.
The application could have a modified quorum protocol such that R+ W > N where R is the number of data reads and W is the number of writes to servers and N is the total number of servers in the memcached server list.
Similar technique has been used in Cassandra & CouchDB etc.
With the “local reads & global writes” technique it is possible keep the latencies within reasonable limits since data reads will be based on proximity. Also replication the data to all regions will also ensure that eventually all regions will have a consistent view of the data.