Simplify! – Part 11
This post is about mitigation techniques regarding accidental complexity in the architectural level.
In the previous post, I discussed several relevant drivers on the architectural level that lead to increased accidental complexity of the IT landscape. As the post would have become too long otherwise, I left out the part what we can do about it.
The need for simplicity
But before I dive into the countermeasures, I would like to reemphasize why accidental complexity has become such a huge problem in IT. We already discussed some relevant problems in the first and second post of this blog series:
- IT becoming indispensable
- Overstrained IT departments
- Overstrained people
- Not being able to move fast enough in a competitive market
But there is one quality that cannot be purchased in this way – and that is reliability. The price of reliability is the pursuit of the utmost simplicity.
He made this statement while discussing the design of PL/I that he considered overly complex.
But his statement also applies on a more general level: Simplicity is the prerequisite of building reliable and robust solutions:
- The more complex a solution becomes, the harder it becomes to understand.
- The harder a solution becomes to understand, the more likely are undetected bugs and other unexpected behavior that reduce reliability and robustness.
- Additionally, reduced understandability impedes maintainability, changeability and evolvability of software. As discussed in this post, systems need to be changed continuously to keep their value. The harder they are to understand, the more likely you introduce bugs or unwanted behavior, i.e, reduce reliability and robustness.
This means we must work hard to avoid accidental complexity in order to maximize reliability and robustness of the solutions – especially if we remind us that IT has become indispensable in many parts of our business and private lifes.
Still, most of the times we see the opposite happen – piling up accidental complexity all the time. Thus, once more:
We need to simplify!
- We need solutions we can manage
- We need solutions that make it easy to move fast
- We need solutions that are robust in operations
- We need solutions that keep the people involved healthy
Improving the situation
With this in mind, let us ask the question once more: How can we do better?
I will first discuss the drivers, I have described in the previous post and then finish with a few more general recommendations.
Unresolved design deficiencies
The first driver of accidental complexity on the architecture level were the unresolved design deficiencies, that we still have not learned how to modularize our solutions best with respect to the given problem, its NFRs and its most likely future evolution.
I think this is a huge topic, maybe the biggest question of software architecture. If we were able to complete the discussion we began more than 50 years ago and come up with good, generally accepted guidelines for modularization, it would be a giant step forward for our whole industry. Still, I doubt that we will solve this problem anytime soon if at all.
Part of my doubts are rooted in the fact that from all that I have seen and learned good design is not only a craft (which is learnable) but also an art (which is a lot harder to learn) and additionally requires a bit of talent (which cannot be learned). I will write about this topic in a lot more detail in future posts.
Still, I would like to give three simply looking, but hard to implement recommendations:
- Dismiss reusability when pondering your modularization – Reusability has a huge value but in most situations where you tackle modularization on an architectural level it is a false friend. I discussed this in more detail here.
- Understand that remote communication changes everything – Modules that run in different process contexts (and maybe on different nodes) at runtime must be designed very differently from modules that are located inside a process. The nondeterministic behavior of remote communication makes distributed modules extremely vulnerable to a bad slicing of functionality. This is essential to understand in the context of (micro)service-oriented application design. I will discuss this topic in a lot more detail in future posts. If you do not want to wait until then, you may want to look at this slide deck for an introduction.
- Understand the importance of good abstractions and interfaces* – The concepts you expose on an interface determine a lot of the coupling you create. If you want to minimize coupling, you need to go for minimal interface exposing the right abstractions. While this discussion started more than 50 years ago, it is still far from being done. I will also write a lot more about it in future posts. A good starting point is the book “A Philosophy of Software Design” by John Ousterhout.
Again, there is a lot more to be said about this topic and I will do in future posts. But for now, I will leave it here and move on to the next driver.
Black holes and strange attractors
The countermeasure is as obvious as it is hard to implement in many situations: Do not let systems grow beyond a certain size. You do not need to fall into the other extreme and go for microservices instead. It is sufficient to go for a still manageable size, e.g., a size that can still be managed by a small team without problems.
There are two counterforces to this approach. The first one is the additional effort that you need to spend to let systems not grow beyond a certain size. The effort to just extend the existing system usually is a lot smaller than splitting it up, including creating good interfaces between the parts, and so on. So, by letting the system just grow, you save money now and pay later (at a high interest rate).
As now time and money are always scarce and people prefer to evade pressure in the present, even if they know that they will pay a high price for it later, usually the system gets just extended instead of spending the effort to split things up properly.
The second counterforce is the lack of knowledge how to modularize systems properly as we discussed before. If we split up a system that grew beyond a certain size without knowing how to split it up properly, chances are that the resulting two parts will be tightly coupled. In other words: We would not be better off than before. In the worst case the situation with the two systems would give us more headaches than the situation with the single, big system.
If you plan to split up a big system, a good starting point is the functionality, not the data. Systems that are designed around a data model tend to end up with a lot higher cohesion (i.e., internal coupling) than systems that are designed around functionality. If you take into account that coupling is measured in terms of calls between functionalities, it should be obvious that the key to low coupling is looking at the functionality, not the data.
The price you need to pay for this is data replication. You will have clusters of functionality (modules, systems, services, …) that share data. Here you need to decide which system is leading for which parts of the data (to have a way to determine the “truth” in case of inconsistencies between copies) and set up appropriate replication and reconciliation mechanisms. If you allow eventual consistency which is good enough in almost any situation 1, this approach will be a lot easier manageable than the ever-growing mega-system.
I will write more about this design approach and its trade-offs in future posts.
This is hard to overcome. On the one hand, the people who come up with the overly complex architectures either do not realize it or do it for the wrong reasons, e.g., to pimp their CV, due to FOMO or just to show off. In either case they will come up with a lot of diverting reasons why it would be the right architecture, which often becomes sort of a tiring shadow boxing.
On the other hand, humans tend to have a complexity bias. As I already wrote in a prior post: In case of doubt, most humans tend to prefer complex solutions over simple ones. As a consequence, people advocating for an overly complex architecture usually find a lot more approval than the one advocating against it. Overly complex architectures often awes people, and their creators are often considered very smart. So, fighting for simplicity very often is a ungrateful fight.
From my experience, the best way to avoid architecture narcissism is to apply Occam’s razor: Consider different options that are suitable to solve the task and go for the simplest solution possible you find in this process. And have a lot of perseverance as most people tend to fall for more complex solutions, i.e., tend to rate them “better”.
In theory, it is totally simple to overcome this complexity driver: Understand both the task and its context in the problem domain and the implementation options in the solution domain. Combine the needs of the task and the options and find the simplest solution option that solves the task.
Easy, isn’t it? Yet, in practice it is surprisingly hard for many people. Many IT people have a hard time to engage themselves with the problem domain, above all if the domain is a non-technical one. Similarly, many non-IT people have a hard time to engage themselves with the IT domain, above all if their home domain is a non-technical one.
Yet, you need to understand both, problem and solution domain to come up with a good solution and reduce the risk of adding accidental complexity on the solution side 2. My personal recommendation is: No matter where you come from, leave your safe home zone once in a while. If you are a software engineer, get in touch with the business departments, the ops departments, with marketing, and so on.
It takes a bit of courage of course. But you will be surprised that most of them will welcome you really warmly if you are genuinely interested in their work. And as a reward you often learn things that are way more interesting than you thought they would be, and you will create a lot better system designs. From my experience, it is absolutely worth it.
Another option is to tackle the problem by teaming up – people from the problem domain and people from the solution domain. Actually, that is what you should try to do at least. Yet again, in practice this often works surprisingly bad.
The key problem tends to be that people do not understand each other. The one side lives in the business domain side and does not understand the IT people. The other side lives in the IT side and does not understand the business people. To make things worse, even people inside these domains tend to not understand each other, e.g., development and operations, both living in the IT domain.
To be able to communicate and collaborate effectively, you need to understand the domains of your team peers good enough to understand what they talk about, what their drivers, needs and pain points are. Without that you simply talk at cross purposes.
That is why I recommend learning the other domains at least a bit. It is not so much being able to do the whole design work alone. But you need to build the prerequisites for effective team collaboration. Unfortunately, I do not see that often in practice.
DIY, OSS and missing the move to the right
As for most of the drivers discussed before the countermeasures to overcome this driver are also quite obvious: Reassess once in a while the positioning of the building blocks in your Wardley maps (or whatever tool you prefer to rate if your technology portfolio is still sensible). If you realize that for a building block there are good options further to the right, go there unless your custom-built solution gives you an actual competitive advantage.
The biggest impediment I observe is the reluctance of many software engineers to engage themselves with any options that are right of “custom-built”. There are many, often ostensible reasons why engineers refuse to dive into product and commodity options. It is what Simon Wardley calls “inertia”.
The problem is: If you do not move in time, you create a competitive disadvantage. You become slower than your competitors because you still have to configure, deploy, maintain, run, backup and patch all the stuff yourself which takes time and effort. Additionally, the blocked capacity is not available to push other topics that are needed to stay competitive. You incapacitate your company with such an attitude.
You may get along with such an attitude for a while, but eventually you will realize that you must move. But then it usually is too late as you lag so far behind your competition that you will not be able to catch up anymore.
As W. Edwards Deming nicely put it:
It is not necessary to change. Survival is not mandatory. 3
Thus, try to understand the options right of “custom-built”. Especially try to understand what you can buy as managed services. Understand the misconceptions regarding OSS and how to evaluate the different options we have today.
Also low code and no code solutions become more and more relevant for certain types of problems. I have seen that the engineering resistance is particularly high regarding such options but then again: Think about such technologies as options to create a competitive advantage for the company, not as a means for self-realization. That is the worst you can do if you design solutions on an architectural level.
This does not mean that you will never build anything on your own anymore or that you have to abandon OSS. You will need to build custom solutions all the time to create a competitive advantage. But know when to move to the right with your custom-built solution, i.e., replace it by a product or a managed service to make room for new, more valuable custom-built solutions in a different, new area.
A last note regarding moving to the right: This is not limited to the solutions themselves. You should also apply this to everything else, e.g., to your CI/CD pipelines: I still see projects wasting weeks with setting up and maintaining a homegrown Jenkins-based pipeline, even if there are tons of great managed service options available meanwhile. Moving to a managed service solution like, e.g., CircleCI or Bitrise usually is the much more sensible approach these days.
A would like to add two general thoughts regarding architecture before wrapping up.
The first thought is: Architecture does not have an end state.
If you look at your IT landscape, how individual solutions embed in it and how they evolve, you will eventually realize that architecture will never reach an end state, neither from a landscape nor from a solution point of view. It will always move on. There will always be old and new technologies. I discuss this dilemma in more detail in a later post of this series.
Here I only wanted to mention the fact that we have that eternal side by side of old and new, of different concepts, tools and technologies. If you embrace this fact, you will start to design solutions differently, usually leading to less accidental complexity. E.g., it becomes obvious that architecture narcissism does not have a place in such a world.
The second thought is: Accept uncertainty and the limits of foresight.
One of the goals of architecture is to minimize overall costs of an application over its lifecycle (I will discuss this in detail in some future posts). To do so, you try to come up with an architecture that minimizes the cumulative efforts needed to implement all future changes and the costs required to run all different versions of the solution.
This creates a dilemma. To create an optimal architecture, you would need to know all future change requests affecting the application now. But you only have limited foresight. You can anticipate the most likely changes in the near future with a bit of diligence. But that is it. Additionally, the uncertainty of markets and technology evolution make foresight even harder.
There are two typical response patterns: One is to not to invest architectural work at all. The typical reasoning is: “We do not know what is going to happen anyway. Thus, why wasting time with planning for things we cannot anticipate? Let us respond to the needs when they emerge. Let the architecture emerge.”
The problem of this approach is that the responses tend to be very local and specific for the given requirements. The usual refactoring patterns that should then ensure that the architecture stays in shape are too low-level to help at a coarse-grained architectural level. Typically, over time this approach leads to a lot of accidental complexity.
The other response pattern is trying to prepare the system for all imaginable future changes. This typically leads to “generic” architectures, incorporating tons of accidental complexity, being prepared for changes that will never come while having missed the changes that actually come.
The approach I recommend instead is:
- Try to understand the problem domain. You do not need perfect knowledge, but you need to understand it good enough to know the use cases, the domain entities, how all those things interact. 90%+ of all change requests over the lifetime of an application are business requirements, not technical requirements. Thus, you need to understand the problem domain good enough to be able to create a system design where the future changes most likely can be kept local.
- Try to understand the most likely change requests in the nearer future. Interview domain experts. Watch market evolutions. Watch the competitors. Scan past change requests that were rejected. And so on. This helps you to refine the design a bit, to decide a bit better how to separate functionality into modules.
- Create the smallest solution possible that does the trick, that separates the functionalities in the desired way. This will not be be the smallest architecture imaginable. But it will also not be a huge architecture. Aim for “just enough architecture”.
- Accept the remaining uncertainty. There will be some nasty change request that you did not anticipate and that will be hard to implement. You can mitigate that risk a bit by reassessing the architecture once in a while, but it will still hit you. This is okay. If most changes are quite simple to implement and just a few will hurt, you did a great job.
I could write a lot more about this ideas (and most likely I will do in some future posts). But I will leave it here for now.
In this post, we first revisited the need for simplicity as especially in architecture all the excess complexity tends to manifest. But complexity is the enemy of reliability and robustness which have become indispensable. That is why we need to strive for simplicity in architecture.
Then we have discussed several potential countermeasures to address the drivers of increased accidental complexity on the architectural level we have introduced in the previous post, plus a few general considerations.
There are definitely a lot more things that I would need to discuss regarding accidental complexity on the architectural level, but again the post is already too long. Thus, I will leave it here.
In the next post, I will discuss accidental complexity on the implementation level. Stay tuned …
Always keep in mind: In the world outside IT, strong (ACID-like) consistency does not exist. In the best case you find eventual consistency. Usually, you do not find any consistency guarantees at all. I will discuss this in more detail in a future post. Please bear with me until then. ↩︎
To prevent a common misunderstanding: This does not mean that you need to be an expert on both sides. It is enough to be an expert on one of the sides – problem or solution domain. But you also need an understanding of the other side to design a good solution. Without understanding the needs and constraints of the problem domain as well as the options and constraints of the solution domain you most likely end up with a suboptimal solution introducing accidental complexity. ↩︎