CQRS/ES – Or is it?

Lately we have read several posts in various forums/blogs that say “tool XYZ is perfect for implementing CQRS systems because it supports event sourcing”. We thought this might be a good time to discuss exactly what CQRS with and without event sourcing is, and the requirements of event-sourced aggregates when used in a CQRS design. This post is not intended to discourage the use of any of the systems mentioned, but rather is just an attempt to help the reader understand what is and isn’t provided by these systems and what you must build yourself.

CQRS – What is it?

At its core, the CQRS pattern is a pattern in which the read model is separate from the write model. In fact, that really is all the pattern means and should be the characteristic common to all CQRS systems. What does this really mean? Well, in a system where the read model is separate from the write model, it means the code processing commands coming into the system works from a different data store than the code that serves views of the system’s current state. Think of it this way, the read model is the stuff you look at when you’re about to tell the system to do something (the data in the web pages, screens, dialogs, etc. of the user interface; or the data provided via API endpoints to external systems). The write model, is the data used internally by the system to evaluate and execute commands (the write model is what data is used by the web server, for instance, to process your commands to the system). Probably the best single resource (in one place) for some context around CQRS is the CQRS Journey.

Event Sourcing – What is it?

Event sourcing is a very broad topic these days but is primarily means sourcing event data from one system or subsystem to another in the form of an event log, if you will. When used in conjunction with CQRS, it can be used to various degrees – it could be just for connecting the write model to the read model, or it could be used as the source of truth for the write model. We’ll talk about these alternatives in a bit.

Using CQRS and ES together

Of course, the canonical article about CQRS+ES and the starting point for any reading about CQRS is Greg’s landmark post, or his “documents”. Of course, it’s pretty old now, but the basic concepts have not changed since its original writing. For our purposes we’ll discuss a few of the options of using CQRS and ES together. It is our opinion that CQRS is best done with ES as the data store for the write model, but for completeness we will discuss both.

CQRS – option 1 (no ES)

The first option we consider is CQRS without any form of event sourcing. In this approach, the read model and write model are separate, but there is no event sourcing involved in the process. An example of this sort of a model might be a system in which all views of the data (web pages, screens, etc.) are served from a read model that is built by polling or updated on demand based on a transactional log or via some technology in the data store (materialized views, etc.). If the system uses polling it will exhibit eventual consistency, which is quite common in CQRS systems. Commands that affect the system will operate on the write model which would typically be some sort of OLTP database or other data store that is used for transactional processing.

CQRS – option 2 (event collaboration)

The second option we consider is one in which event sourcing is used to connect the read and write models. In this sort of a design, we typically see a system where the data store for the write model is OLTP or some other transactional data store, and upon making changes to that data store, events are also logged to an ‘event log’. This log is then used on the read side by report builders that update the read model based on new events seen since the last time the read model was updated.  Martin Fowler describes this approach as Event Collaboration.

CQRS – option 3 (ES ‘all the way’)

The third option we consider (and the one we prefer) is one in which the aggregates (domain objects) themselves are stored only as a sequence of events in the event store. This is what we refer to as ‘Event Sourced Aggregates’ and really comes from the combination of DDD, CQRS, and ES into a single holistic approach. When the system is built in this way, all command processing is handled by command processors that hydrate the aggregate from the events it has previously ‘stored’, apply the command (typically using code in the aggregate itself), and save the new events that are sourced by the aggregate in order to reflect the new changes in state. In this model, the only thing stored for an aggregate in the write model is the events that make up the state changes in the system. There is no ‘current’ state saved as in the traditional OLTP models of the past 30 years. Of course, if the event streams become large (and aggregates have a long lifetime) it makes sense to optimize this process a bit by saving some of the aggregate state in the form of a memento, but this is just an optimization and does not serve as the ‘source of truth’ for the aggregate. In all cases, the ‘source of truth’ is the ES events saved by and for the aggregate that represent state changes in the system. This option is described in a previous blog post (by our Director of Engineering – Robert Reppel).

So, how do I build it?

So… all this was leading up to some of the articles we’ve seen and discussions we’ve heard describe how various existing technologies are great for building CQRS systems (and where we might disagree with those assertions). Some of the technologies I’ve heard mentioned are below (where I was able to find the discussion, links are provided)

  • Apache Kafkahere’s one that describes as good for CQRS, also here
  • Amazon Kinesis (part of AWS)
  • Blobs, Tables, etc. (Azure or AWS) – see here for one, but we’ve discussed this several times over the years

Let’s lay some ground work first…


The process of updating an aggregate becomes the optimistic concurrency transaction boundary for the write model. That sounds like a bunch of CS stuff but basically it means that you need to be able to do ‘atomic’ updates to the write model at the aggregate level (the transaction boundary) so that you are guaranteed that no two writers can change the aggregate at the same time (and succeed in doing so). In most CQRS systems that use events as the storage for the write model, you see a process like the following when processing a command:

  1. Load the aggregate from existing events
  2. Apply the command to the aggregate
  3. Write any new events from (2) back to the event store, if and only if the aggregate doesn’t already have new events that weren’t loaded in (1) (i.e. since we last loaded it).
  4. If (3) failed due to new events, repeat (1)-(3) until (3) succeeds or we fail the command for too many retries.

Sometimes this is referred to as a “unit of work” pattern where (1)-(3) constitutes the work, and failure on (3) causes a retry subject to some UoW retry rules. The reason it is important that the steps (1) – (3) are repeated is that if we just put the events from the conflicting transaction at the end of the event stream, we might be lying to ourselves, since the command itself may not have been valid in light of the new events. Consider a bank withdrawal where the withdrawal can only occur if the bank account balance is sufficient (no overdraft line of credit, or anything like that). Now, consider that a bank account has the following events:

  1. Account created
  2. Deposited $200

If we have two writers competing simultaneously to withdraw $150, then each of them will load the aggregate in a state of having $200 in the account balance. Each will believe that their withdrawal is valid, and will “accept” the withdrawal, and each will produce an event “3. Withdrew (150)”. In this case, there are three different event streams we could see:

  1. Account created
  2. Deposited $200
  3. Withdrew $150
  4. Withdrew $150


  1. Account created
  2. Deposited $200
  3. Withdrew $150


  1. Account created
  2. Deposited $200
  3. Withdrew $150
  4. Withdrawal Rejected $150

Of course, the second and third event streams are acceptable and the first is not (the difference between the second and third is a matter of opinion as to whether all commands successfully “processed” should result in at least one event – this is a design decision and can be left to the reader to decide what’s best for their system). The first event stream is obviously not valid under the business rules described above (no overdraft line). If we make a naïve implementation of the event store and unit of work pattern, we can end up with event streams that look like the first (or worse, event streams where the second withdrawal was “approved” and the client was told so, but the event was lost due to an overwrite!).

The rule that comes out of this concurrency concern is that no events can be written to the event store that were based on a version of the aggregate that is not the current version at the time of writing those events. If we ensure that the event stream for an aggregate can only grow in a forward manner, and we ensure that no overwrites are possible, then it is clear that this means a write of a set of events may be accompanied by the event sequence of the latest event seen when loading the aggregate (this is an optimization instead of saying “all events”). What does this mean for implementations? It means events must be batchable in an atomic write, and those writes must not allow overwrites (changing history). We can drop the “concurrency” (sequence) issue if we ensure batches are atomic and non-overwriting. Also, it’s obviously imperative that event order cannot change once events are written, otherwise a similar situation can be created to the one described above.

So, our event store requirements for aggregates are:

  1. A write of one or more events to an event stream as part of a single unit of work must be able to be performed atomically
  2. A write of any events cannot overwrite other events already persisted to the event store
  3. The order of events in an aggregate’s event stream can never change

Armed with this information, let’s look at some of our choices.

Apache Kafka

Kafka is an excellent example of a shared log. In fact, it could very easily be used as the basis for an event store implementation. However, by itself it is not suitable directly for use as an event store. When writing events to Kafka, all events are appended to the “end” of the stream. There is no way to conditionally write events at a specific sequence and batch those events together. So… how would you build an event store on top of Kafka? Easy – you would store all “attempted” writes in the event history in Kafka and then you would have to have all consumers of the Kafka log determine whether the “attempts” were valid or not based on the information stored in the attempt (i.e. starting event sequence). This detail is seemingly overlooked in all the discussion around using Kafka as an event store for CQRS systems. The Tango paper is a good source for a description of how you might do this (they write all ‘commits’ to the stream and then determine later if the commits are actually ‘valid’ or ‘rejected’ based on the state of the aggregates). Unfortunately, this means that all consumers of the log must understand the rules of when the transactions are “accepted” or “rejected” – leaking this logic everywhere throughout the system.

Amazon Kinesis

Kinesis is an interesting event-store-like offering recently created by Amazon as part of their AWS platform. It sounds a bit like a Kafka-like cloud hosted service from Amazon where you can dispense with the difficulty of managing your own Kafka cluster. However, Kinesis does not persist your events ‘forever’, so you can’t really use it as the primary event store. It would, of course, be possible to use it as the ‘gateway’ to an event store so long as you could guarantee that all events ‘agreed to’ by the write model eventually made their way into a more durable store. However, Kinesis doesn’t seem to be a great fit for the query pattern (not to be confused with the Q in CQRS) used in event sourced aggregates either – it only supports a maximum of 5 reads per second on a stream. It also suffers from the same issue that Kafka does surrounding atomic write guarantees (no way to make writes conditional on current sequence of the stream).

Blobs, Tables, etc. (Azure or AWS)

Using blob or table storage in Azure (Blobs / Tables) or AWS (S3 / DynamoDB) presents problems as well. An event store can definitely be built on top of Azure Table Storage or AWS DynamoDB, but doing so is not nearly as trivial as it first seems for the same reasons as the other alternatives listed above. In Azure, atomic writes are possible on batches in table storage, but your batches are limited in size and your writes need to be very carefully designed in order to get decent performance and atomicity guarantees at the same time. We’ve seen event stores built on Table Storage (here is a system built this way, though no details are given). We’ve seen them built (sometimes badly) on Table Storage and DynamoDB as well. Even with the atomicity guarantees provided by these platforms, building an event store on top of them is rather difficult, since the atomicity they provide does not directly support the atomicity guarantees required by an event store. This leads to hybrid solutions with operations that look similar to a 2PC or 3PC implementation or similar shared-log approaches to the one described above in the Kafka discussion (like the Tango paper).

So what do I use?

So… we currently build a lot of our own event stores for the starting point of our application. If you need an event store already built and cannot build your own (or probably shouldn’t), there’s not a lot of choice out there right now. The only real offering that is turnkey that we’re aware of is the EventStore from Greg Young (the creator of CQRS). Of course, feel free to contact us at AdapTech if you’d like some help choosing an event store, or if you’d like to talk to us about other CQRS topics. As far as we know, we are the only all-CQRS all-the-time firm out there!

Kelly Leahy (@kellyleahy) is Adaptech’s Director of US Operations. 


  1. […] Perhaps you have developers in your organization who are trying to work that way, enthusing about the patterns and rather elegant plumbing that sometimes goes with eventsourced […]

  2. […] Cqrs/es – Or Is It? – Adaptech Solutions […]