This is part 2 of “Rise Of The Robots”, a three part series on event-based systems and their implications. For part 1, see Rise Of The Robots: Event Logs vs. “Traditional” Databases.
Maybe you have come across eventsourcing fans and heard talk about “event streams”, “lossless data capture”, “Domain Driven Design” and “command-query-response-segregation” (CQRS). Perhaps you have developers in your organization who are trying to work that way, enthusing about the patterns and the rather elegant plumbing that sometimes goes with eventsourced solutions.
So why are people into that? Let’s attempt to peel back the technology so we might see what lies beneath …
Software is Communication
Organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations.
This is Conway’s Law.
When software projects fail it’s often for a combination of three reasons, each of which has communications problems at its root:
- What was built wasn’t fit for its purpose. When we spend our days building technology without regularly engaging with customers, the organization’s leadership and the folks who sell and support the products, we are apt to deliver great technical solutions which don’t satisfy users and don’t sell.
- The product wasn’t ready in time, resulting in missed opportunities and strained relationships. For products or features, when we look at the elapsed time from “we want it” to “customers use it” we are struck by how much time passes without anything happening: Todos sit in backlogs, are blocked by dependencies on deliverables by other teams or waiting for QA, IT or other resource availability. This tends to happen because priorities aren’t aligned and objectives are conflicting or not shared by all. Here too communication makes or breaks success – the larger the number of parties who need to coordinate and the more frequently they need to do so in order to succeed, the higher the potential for “broken telephone” effects causing time consuming rework, delays and side effects in other parts of the system.
- The effort and money expended in delivering the project outstripped the returns. Instead of focusing on the business’s core domain, the parts where money is made and where the secret sauce occurs which makes us competitive, we pour too much effort into building generic functionality. Instead of going with the simplest possible solution we overbuild, use unwieldy frameworks and tools and default to belt-and-braces tech where YAGNI (“You Ain’t Gonna Need It”) would suffice. Communications again – YAGNI done without all-around knowledge of the trade-offs, risks and rewards and an explicit and regularly reaffirmed “this is OK” from the leadership can be hazardous.
Delivering Software: How Code Shapes Process
So how do organizations end up with communications structures which turn out to be counterproductive?
It looks like it might be an unintended side effect of our need to feel safe.
Building software products is complex, often highly innovative business. Successful practitioners of the craft have long understood that creating an emotionally safe environment is vital to success. In a work situation, a very important part of feeling safe is the perception of having control of our environment: Is there sufficient overlap between what we are responsible for and what we have control over?
Consider a fairly typically “n-tier” architecture, the kind of system which would have evolved from small beginnings, growing over several years of adding new features and enhancements:
Because the components of such systems are often not sufficiently separate from each other to be independently maintainable, this type of implementation ends up being increasingly large, unwieldy and complicated as time passes. Because the interactions between the parts are difficult to understand and the parts are not as cohesive as they need to be, the resulting side effects can lead to failures and bugs. This is not conducive to managers and developers feeling safe about making changes.
Therefore, in order to get back to feeling secure, process is introduced which gradually makes the org chart (and therefore the communications structures) resemble the application architecture:
Checkout/Ordering team: “Your Payment Processing release broke Checkout in production.”
Payment Processing team: “Didn’t know this would affect Checkout. Had no time to look into it because we were too busy working on the new payment functionality in the iOS app.”
Mobile App Product Visionary: “I want my own dedicated team. This happened because Payment Processing is spread too thin to work on the app and the backend at the same time.”
Development Manager: “Org chart change – let’s have all server-side developers on the same team so these disconnects no longer happen.”
IT Manager: “The risk of letting dev teams own the infrastructure is too great. From now on, IT approval is required for any database- and server changes.”
QA Manager: “Our new policy is to only sign off on production releases after thorough regression testing of all system components.”
From a safety point of view, this doesn’t sound unreasonable: The system is too large and complex, so let’s put checks & balances in place to manage the inherent risks. As long as everybody plays by the rules, all should be well. However, from a getting-things-done perspective there is a problem – None of the parties involved owns enough of the stack to deliver something independently from the others. In order to build new features, a lot of coordination and juggling of priorities and resources is required. The resulting need for more management and formal process slows things down. Communication is often second hand, belated and lacking, causing disconnects. In an industry where the ability to execute well under conditions of rapid change is essential, this is sub-optimal. At the same time, the dynamics in play are extremely difficult to change because they are driven by deep human needs – the desire to avoid blame and to feel safe.
Introducing “Agile” processes tries to address this, but in practice the results are often spotty because only the organizational part of the problem is addressed. This ignores that there is a kind of three-legged race relationship between how people are able to collaborate and the architecture of the underlying system. Agile’s impact is blunted if the code base is too large, too hard to understand and too side effect prone to lend itself to being changed safely.
The conundrum presented by increasing system size has been addressed very successfully in more mature industries, by finding ways of building complex systems from easier to handle simpler components. As a result, rotating tires on a car requires no knowledge of how the carburetor works, seat belts and fan belts are not located in a shared belt sub-assembly and opening the trunk won’t switch on the wipers.
The Promise of Eventsourced Systems
Here is how our example might look as a CQRS / eventsourced architecture:
Here, smaller subsystems communicate via agreed-upon contracts (commands and events). Each has their own data store. Eventsourcing and the CQRS pattern are used to reduce coupling (and therefore the potential for side effects) to a minimum. Cohesion (all components have a single area of responsibility within which changes and enhancements can be done independently) is achieved by applying Strategic Domain Driven Design techniques, namely Ubiquitous Language and Bounded Contexts. Patterns and practices exist which allow such systems to safely interact with legacy code if required. Subsystems are much less likely to bring down other other components and are easier to understand. This greatly reduces the potential for out-of-control complexity which undermines the teams’ sense being able to operate safely and under their own control. There is less risk, therefore less need for checks & balances. Communication patterns help instead hinder delivery: Project management is much simpler because the number of interactions needed to get everybody working together smoothly is reduced and there is less complexity to explain.
What makes the combination of eventsourcing, Domain Driven Design and CQRS so attractive is that it can greatly simplify building software which keeps subsystems cleanly separated and independently maintainable as more features are added over time, akin to what we have learned to do for cars, spacecraft and toasters. When this is applied in combination with agile delivery techniques the approach can be very productive while feeling much safer than more “traditional” alternatives. This potential, together with increased awareness that Big Data, analytics and machine learning benefit greatly from the comprehensive capture of event data makes it likely that eventsourcing is going to be increasingly popular.
Coming up next – Event-logged future: Privacy, transparency and the case for machine ethics.
Robert Reppel (@robertreppel) is Adaptech’s Director of Engineering.