<< What's Really Hard? | Cook! Where's my dll? >>

Pools of ACID

by coatta 12/19/2010 1:28:00 PM

There's been a lot of interest recently over alternatives to the tradition DB-based application, things like NoSQL and BASE. A lot of the dicsussions that I've heard about these systems seem to imply that these approaches are drop-in replacements for the older technology, which is not true. Both these, and various other alternative appoaches to working with persistent data, provide different semantics from a SQL DB coupled with transactions. In particular, transactions make it look like your working with a single-threaded system. In the context of a specific transaction, data isn't going to change underneath of you. Furthermore, you know that you won't be exposed to data which is in an inconsistent state, like a reference to a non-existent row in another table. These are pretty powerful guarantees. They make it relatively easy to reason about how your program is going to behave.

The trouble is that this power comes at a cost. One of the most significant of these is the CAP Theorem, which says that if you're going to have transactions that span DB's across multiple machines you system can't keep working in the face of failures. Some people seem to have responded to this state of affairs as though the only solution was simply to get rid of transactions. This strikes me as a bit of "baby/bathwater" scenario. Not using transactions at all makes programming much more complicated because your code needs to start dealing with more scenarios, such as dealing with data that is not in completely consistent state.

I think a better approach is what I call pools of ACID. Using this approach, a system is decomposed into a set of cooperating processes. Within each process, transactions are used to provide a straightforward programming model. Processes interact with each other through a more loosely coupled request/response protocol and transactions are not propagaged from one process to another. Using this approach, you make conscious decisions about where transactions are most useful and where it is viable to have weaker guarantees.

We are currently using this approach where I work (Vitrium Systems). Each of our cooperating processes uses NHibernate / SQL Server to provide a transactional object model which is reasonably simple to work with. Processes are connected via NServiceBus. One of the nice things about this collection of tools is that the interaction with NServiceBus is transactional, even though transactions don't span processes. That is, the act of making a request through NServiceBus is part of the overall transaction associated with handling a request. If a failure occurs at any point in handling the request and the transaction is rolled back, then the NServiceBus request is rolled back as well.

Overall, this architecture is working well for us. It allows us to use the power of transactions where appropriate, and still achieve the degree of scalability and robustness that we need.

Tags: distributed computing