Simplify! – Part 10
In the previous post, I discussed the technology explosion we could observe in the recent years, its drivers, how it often creates unneeded complexity, reinforcing drivers and what we can do about it.
I also mentioned that some widespread misconceptions regarding OSS (Open Source Software) act as a driver of accidental complexity. Therefore I inserted a little blog series discussing OSS, its rise, its benefits, the typical misconceptions and its role today to provide the required prerequisites for this post (and to keep its length acceptable).
In this post we will discuss accidental complexity on the architectural level, how it develops and what we can do about it. Actually, I think there are few areas in IT where you can find more drivers for accidental complexity than in software architecture. From my experience it is a bottomless pit and I hardly know where to begin and where to end. Maybe I am a bit more sensitized because I work in software architecture for a long time, but still …
Therefore I do not try to cover all areas in this post, but only the most obvious ones that I see over and over again. Still, as this would result in a way too long post, I decided to split this topic up in two parts: In this post I discuss the drivers that lead to accidental complexity on the architecture level. In the next post, I discuss mitigation options.
Unresolved design deficiencies
The probably biggest problem in the domain of architecture is that we still have not learned how to slice systems, i.e., how to create the right modules 1 bundling the right functionalities. I will discuss this topic in more detail in several future posts. Here I just briefly sketch it to illustrate how it leads to accidental complexity.
The problem is prevalent since the late 1960s when systems became so big for the first time that they needed some kind of structure to keep them manageable. There was some promising practical research in the early 1970s regarding this topic but afterwards the topic basically came to a standstill. I.e., no new insights were made, just the same findings were “rediscovered” with each new technology or methodology wave.
At the moment, Domain-driven design is all the rage. While this approach collects some good practices, it also does not offer anything new compared to the state of knowledge in the early 1970s. Still, it is better to have this collection of practices than having nothing.
The bigger problem is that knowing the patterns does not mean how to apply them in practice. This is an issue I have observed in many places:
- “We need to define the bounded contexts.” “How do we find out what belongs to which context?” “Er …”
- “We need to implement low coupling.” “How do we figure out what belongs on which side of the coupling boundary?” “Er …”
- “Separation of concerns is important” “How do we identify the concerns that need to be separated?” “Er …”
And so on. Knowing what to do does not mean knowing how to do it. If you ask people about concepts like the aforementioned ones, they just reply in a bored tone that of course they know it. If you ask the same people if they know how they implement those concepts in their current context, typically you get some stereotypical panacea or some other evasive response. If you then look into the actual designs, you rarely find anything that resembles those concepts.
I do not want to blame those people. It is so much harder to know how to actually do something than simply to know what it is. These two things are fundamentally different. The what requires just understanding some definition or theoretical concept. The how requires transcending the theoretical concept and learning how to apply it to a concrete problem, plus a lot of training. Understanding what, e.g., low coupling is might be a matter of minutes. Learning how to do it right in practice can take years.
The computer science papers of the early 1970s discussed the “How”-topic and came up with a few interesting ideas. But as part of their answers was that there are no easy “best practices”, that this will always be a hard problem, the majority of IT moved on to other topics, always hoping for the magical silver bullet that will solve this hard problem – until the current day.
Yet, without understanding how to slice systems in a good way, i.e., how to modularize them as good as possible with respect to understandability (of the code), maintainability, changeability and evolvability over the lifetime of the systems while not compromising the runtime qualities like, e.g., availability, response time behavior 2, security, scalability, etc., we will always create a lot of accidental complexity.
We will always create designs that are far from being optimal with respect to understandability, maintainability, changeability and evolvability, that implement dependencies that complicate our work to the point that we can only guess what is going on. Also without the knowledge how to create a good design with the aforementioned properties, chances that we come up with such a design are close to zero.
Even if the initial design is good due to a lucky coincidence, chances are that it gets destroyed with the next design change. We do not even need to take the typical delivery pressure into account that deprives us from the time needed to think through the design 3.
Overall, we tend to add a lot of accidental complexity on the architecture and design level, simply because as an industry we still do not know how to do good design. 4
Black holes and strange attractors
A side effect of the deficiency described before is what a former colleague of mine called “black holes and strange attractors”: If big systems exceed a certain size they tend to develop sort of a “gravity”. Whenever you need a new functionality, you realize that you either need data or functionalities from the big system to implement the new functionality. But often the big systems do not provide good APIs to access their data and functionalities. Thus, you decide to build the new functionality into the big system.
That means that the big system and the new functionality are tightly coupled on the functional and data level, typically also on the technical level. The big system has just become bigger. The bigger the system becomes, the harder it becomes to implement new functionality without somehow integrating it into the big system. This creates a reinforcing effect, resulting in systems that become bigger and bigger.
As these systems and potential extension tend to be tightly coupled, most of the times we are basically confronted with a “big ball of mud”, with a system of poor structure, many hard to oversee dependencies, with a lot more accidental complexity than needed.
Another driver of accidental complexity especially in the recent years is what I tend to call “architecture narcissism” 5. Some architects or lead engineers want to build their own monuments. The reasons vary. Sometimes it is driven by FOMO due to the fast technological evolution. Sometimes they miss some key words on their CVs. Sometimes they just want to show off. Sometimes their decision makers or clients want a buzzword bingo before they approve the project. Sometimes they fall for the hype-industry 6.
But no matter what their actual motivation is, the effect is always the same: Overly complicated architectures, piling up lots of unneeded complexity – in the worst case impeding the implementation of the actual requirements. If you recollect the 4 types of accidental complexity, this can result in any combination of the latter three types:
- Accidental complexity due to overly complex platforms regarding the given problem
- Accidental complexity due to a lack of adequate platform support regarding the given problem
- Accidental complexity due to overly complex software regarding the given problem
Interestingly enough, such designs often result in all three types of accidental complexity at the same time:
- Parts of the platform are way too complex regarding the given problem
- Other parts of the platform do not support the given problem adequately
- Also the software itself as more complex as needed to solve the given problem (not only due lack of platform support)
You often end up in such situations if the architecture has become an end in itself and is not aligned with the problem at hand.
You often see similar effects – yet for different reasons – if the solution teams only focus on technology and lack to understand the given problem adequately on the business domain level. If you do not understand the problem on the business domain level, you most likely end up with a bad slicing of functionalities, i.e., modules with lots of dependencies on a functional level.
You also have a high probability to create solutions that are overly complex or provide insufficient support regarding the given problem. Overall, if you create a solution for a problem that you haven’t understood, most likely you will come up with an inadequate solution, including accidental complexity of all kinds.
While it sounds obvious if written down this way (“you need to understand a problem before you can solve it”), we can observe the opposite quite often in IT: Whole solution teams focused solely on technology, trying to ignore the actual business problems as much as possible. They just want their product owners/managers to write down isolated bits of functionality that they can code, not caring about the overall problem to solve and how the given bit of functionality fits in.
On an architectural level this often leads to overly complex “generic” solutions that try to be “open” for arbitrary kinds of requirement. As you do not understand the given problem and the domain it is embedded in, you have no idea what types of requirements are more likely than others. Hence, you design a way too complex solution with way too many degrees of freedom (and usually the wrong ones), just to feel prepared for the future requirements to come.
A similar problem arises if you only understand the business domain and not the IT domain. We see a lot of people these days who think that they are prepared to design mission-critical IT solutions just because they have a shallow understanding of IT. Again, it should be obvious that this approach does not work well: You need to understand your solution toolbox, how the different tools work and the consequences of their usage to be able to craft a good solution.
Still, we also see this anti-pattern often, especially if the business domain is a technical one like mechanical or electrical engineering or alike. Very often the solutions that are designed by such people contain lots of accidental complexity.
DIY and OSS
The desire for DIY (Do it yourself), often fueled by misconceptions about OSS is another big driver of accidental complexity. While the desire to design complex solutions on your own and keep control over every aspect of the solution is understandable from an engineering point of view 7, it often leads to a lot of accidental complexity.
If we look at the X-axis of Wardley maps, it describes an evolution cycle for all types of “technologies” (not necessarily being IT-related) and offerings:
- When a new idea emerges, its first implementations start in “genesis”, also known as “bleeding edge”.
- From there solutions evolve into “custom built”, i.e., software departments still tend to build the solutions own their own, but as experience grows they become more and more robust.
- Eventually, in the “product/rental” phase ready-to-use solutions will emerge, products you can buy or rent.
- In the last phase solutions turn into a “commodity”. You just use it whenever you need it. You do not make big selection processes anymore as you did in the product/rental phase. It is ubiquitously available. Just use the next best offering that does the trick.
This is quite a simplified description of the four phases of a Wardley map. There are a lot more subtleties associated with each phase. Still, for the sake of this post the description is accurate enough.
The key point is that DIY is primarily located in phase 2 – custom built. You might use some components from further right, e.g., an OSS library or a (low-level) cloud service, but basically you build the solution on your own.
At a certain point in the evolution lifecycle of any offering or technology this makes perfect sense. But eventually the forces of demand and supply will create equivalently good (or even better) offerings further to the right. Economically it then does not make any sense anymore to stick with the custom-built solution. It just blocks capacities, money and resources that you could use in more valuable places otherwise. And it also means that you need to maintain, evolve and run a lot more – meanwhile accidental – complexity in your IT landscape than you need to.
Unfortunately, the widespread misconceptions about OSS that OSS costs nothing and does not create any lock-in often reinforce the DIY habit and impede the move to the right – or are simply used to “justify” the habit.
Missing the move to the right
You can observe nicely if you missed the point in time to move to the right on a Wardley map if you honestly create your own Wardley maps. You want to offer your customers something. To do so, you need something, which in turn needs something. This creates a chain (actually a directed graph) of needs down to very basic resources like electricity. Ideally this chain of needs moves from left to right following it from the top to the bottom.
There are situations where it makes sense to move to the left while going down, e.g., because your innovative idea at that level gives you a competitive advantage regarding your offering. Still, whenever you see an arrow going to the left, you need to question it: Does this component or technology really give me a competitive advantage regarding my customer offering or did I just miss the point in time when I needed to replace the self-built solution with a product or even a utility?
Especially with the rise of managed services, we can observe a strong move to the right regarding IT offerings and technologies on all levels, and we need to question our homegrown solutions in a new way, if they still create a competitive advantage. If they do not provide any advantage compared to a service that I simply can use as a utility, it is not only wasted money and opportunities. As written before, it also means that I need to maintain, evolve, resulting in lots of unneeded accidental complexity in my IT landscape.
You can combine that reasoning with the current wave of (DIY) technology explosion that we discussed in the previous post of this series: We see tons of new concepts and technologies pushing into the map at the bottom left corner. Why should you implement a relatively mature business offering positioned a lot further to the right with such a technology? This only makes sense if we can use it to explore completely new offerings that were not possible before due to the lack of exactly that technology.
Yet, we often see situations where arrows move from right to left, indicating that we either fell for a new hype or missed to move on at the right point in time, increasing the accidental complexity of the IT landscape.
In this post, we have discussed several relevant drivers on the architectural level that lead to increased accidental complexity of the IT landscape. There are certainly more drivers, but the post is already too long with the ones we discussed.
In the next post, I will then discuss options to mitigate the problems, ways to reduce accidental complexity on the architecture level. Stay tuned …
I use the term “module” here in its most general sense, i.e., some kind of bundling of functionality and/or data (depending on the type of bundling) that is accessible from the outside via a defined interface. The term does not imply any specific implementation paradigm or technology. It also does not define if the modularization affects the system at implementation time (e.g., source code namespaces), build time (e.g., libraries) or runtime (e.g., microservices). ↩︎
Strictly speaking, response time behavior is part of availability as a violation of expected response time behavior counts as “not available”. Still, I like to mention it explicitly because on the one hand many people are not aware of the aforementioned fact and on the other hand response time behavior is a very essential quality in many systems of today. ↩︎
Some “Agile” and TDD (Test-driven development) advocates claim that explicit architectural and design work is not needed, that it would “emerge”. For a toy application or a coding kata this might be true. But any system that embodies a certain amount of complexity needs some explicit design and structuring thoughts. Hoping that design emerging on a micro level due to TDD or alike will lead to an appropriate design on a macro level is – put politely – not very realistic. This does not mean that you should spend eternities poring over a “perfect” architecture (which you will not find anyway). But you will need to think about it explicitly. ↩︎
To be clear: There are individuals who know how to create a good design. But in general, as an industry we lack this skill. ↩︎
Or “architecture porn” if I am in a more snarky mood. ↩︎
A side effect of the latest wave of Cambrian explosion of technology that we are still stuck in is that we have of plethora of tool and technology vendors that all fight for their market share. Additionally, all the consulting companies want to sell their services. To keep demand high, they try to create new hypes and “must haves” all the time. Otherwise demand might sink. Additionally, a big media and training industry also improves its revenues from ever-new topics they need to write, stream and teach about. Nobody of all these parties has the slightest interest in a stable and settled IT landscape as it would threaten their business models. I do not blame any of the companies for doing what they do (I also work for a consulting company). You just should have this in mind when you hear about the next “must have” hype. ↩︎
Many of us started with software development because they loved to solve intricate problems with software, finding algorithms and data structures that provided an elegant (i.e., suitable to represent the problem) solution and seeing all the different bits and pieces working together to create the desired result. This may sound a bit kitschy but still you will find this motivation in many software engineers who decided to start a career in IT. It is this “ability to create worlds with software”. I think that is what also creates a strong motivation for DIY in most engineers. ↩︎