Active-Active Shared-Nothing Database Architecture | Capital One

Preamble

“Hard to believe it’s not possible,” implored a confused Ted. “This is 2020; surely, there must be another way”.

Getting the Basics

First things first, Jane announces, is to get the definition of AASN right. It means two copies of a database in two different geographic stores with the same data both serving the copies of the application running in that data center as shown in the figure below.

State of the Application

“You mentioned AASN is not possible in many applications,” inquires Ted. “Can you explain those cases?”

  1. Potential conflicts due to changes occurring in the same record in both copies.
  2. Potential corruption of data due to one update coming in late after a conflicting update at the other copy.

Traditional Approaches

Jane continues her narrative on the approaches. First, she talks about Active/Passive Shared-Nothing (APSN) architecture, stressing the word “Passive.” Here is a traditionally accepted view of an APSN architecture in the database tier:

Brown-Out Period

However, before the load balancer can send the traffic to application A2, it has to make sure that the standby database is completely caught up with the changes made to the database D1. This period, however small, is still perceptible to the application and is called a brown-out period. The duration of brown-out depends entirely on the amount of changes made to the database D1, and could be possibly zero during periods of no or low activity, Jane explains.

  1. It adds performance overhead to the applications, since the database must get the acknowledgement from both D1 and D2 before sending the commit response to the application.

Hot Standby

That put a damper on the mood of the audience who expected to have a quick answer. Jane continues her narrative. After Region R1 comes back up, the replication in the reverse direction is started.

CAP Theorem

Jane poses a question for the audience to ponder on, “When we store multiple copies of the same data in two datastores to address the failure of a single copy, what happens when a copy fails?” Will the other copies be in a state to immediately assume the operations from the failure? They may, or may not, depending on the architecture. This is where rules of CAP Theorem — Consistency, Availability and Partition Tolerance ( https://en.wikipedia.org/wiki/CAP_theorem) — come in.

Eventually Consistent State

So, what is the point of maintaining the copies if they are not consistent, the audience asks.

Conflict Resolution

Jane explains a second problem. She considers another scenario as explained in the following diagram. We start with the original value 1 as before, and application A2 updates it to 2 in datastore D2. But before it is propagated to all other datastores, application A3, connected to datastore D3, updates the value to 3. What will be value in datastore D1?

Conflict Resolution Handling Techniques

Ted and the audience now have a clear understanding of the potential problems of Active-Active Shared-Nothing architecture. But they wonder aloud, is there any technique to avoid these problems, especially the conflict resolution?

  • Timestamp Weight: Is very similar to the last man standing solution, with one caveat. Rather than relying on the order of updates coming in, it checks the timestamp of the updates and compares the updates along the lines of timestamp alone. This requires all the three datastores to be synced up with a single time server (which somewhat erodes the “shared-nothing” part); but it is probably more fair and avoids the race conditions.
  • Locality Weight: Each datastore is assigned a weight. The update from the highest weighted store wins and is applied to all the other datastores. In the previous example, if the weights of D1, D2 and D3 were 300, 200 and 100 respectively, then the value will be 2 eventually, since that was the update from D2, with weight 200, compared to D3, with weight 100. So the value will be updated to 2 in D3, overwriting its own change of 3. Similarly, D1 will be re-updated with 2.
  • Application Weight: Each update is tagged with an application ID and each application is also weighted. The highest weighted application’s changes are eventually saved.

Architectural Decisions

Generally datastores are divided into multiple types depending on their usage, Jane explains:

  • System of Reference: The datastore is used as a secondary system of data, used for reference. Analytical stores fall in this category. Machine Learning, historical data analysis is done on this datastore.
  • Read Only: The datastore is used for read only activities and no update ever happens.
  • Static Content: The datastore is used to host static content. Examples include hosted images for web properties and marketing collaterals that do not change often.
  • Cache: The datastore is used for caching data across multiple applications for faster access and is extremely sensitive to latency.
  • Session State: The data is local to a specific session of the application and is irrelevant outside the session. Examples include putting behavioral data for users’ interaction in an application, shopping cart, cookies, etc.
  1. How will the application address the logical change in data due to the conflict resolution?

Application Design Patterns

Now that Debbie understands the nuances of the AASN data tier, she wants to learn some of the application patterns to leverage it.

Stateless vs Stateful Applications

The key to the design, Jane explains, is the question: is the application stateless or stateful?

Pattern 1: Many Masters but Only One Active

Pattern 2: One Master and Many Standbys (Readers)

Pattern 3: One Feeder and Many Readers

Pattern 4: Many Masters but Updated by Application

Pattern 5: Multiple Masters Buffered Writes

  1. The order in which data is written is not guaranteed; so there may be data consistency issues.

Adjournment

In summary, Jane concludes that the success of the Active-Active Shared-Nothing database tier architecture depends on the type of database, its usage, and the ability and willingness of the application to handle the data update conflicts. It’s never a simple case of turning on the bi-directional replication at the database level and expecting the applications will need to be ignorant of that. In general, Systems of Record datastores are hardest to implement and Session State datastores are easiest. So Acme can implement AASN at the database tier for many systems without an application change, and for some with some application change, and not at all for some. There is also no need for AASN in the data tier for some types of systems while enabling high availability in them.

Key Takeaways

  1. AASN architecture means the datastores are spread out geographically with no assets shared between them and each datastore willing to service the instance of the application running locally.
  2. In most cases, AASN database tiers cannot be done without application refactoring.
  3. Datastores are divided into the following types: System of Record, System of Reference, Read Only, Static Content, Cache, and Session State.
  4. It is not possible for all datastores to have AASN configuration. In general, in the above spectrum, the suitability ranges from unsuitable to suitable from left to right.
  5. A single data element could be updated by two different applications in two different regions, causing data conflict. The applications have to be cognizant of that possibility. Not all applications can handle it, even after refactoring.
  6. Almost all replications are asynchronous, which means there will be delayed updates overwriting a more recent update, causing data corruption.
  7. Synchronous replications, while possible, are very expensive and often impractical.
  8. In some database technologies, the datastores are not consistent with each other immediately, but rather eventually. This means the applications will get different data depending on which copy they are connected to.

Award winning data management and engineering leader, big data and processing enthusiast, raspberry pi junkie, dad and husband — not necessarily in that order.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store