The essence of architectural work - Part 3
The economic purpose of architectural work
The essence of architectural work - Part 3
In the previous post, we discussed several architectural anti-patterns. We have seen that we are a lot more vulnerable to them if we do not understand why we do architectural work, i.e., understand its purpose. We understood that the focus and thus value of architectural work is accidental without having a clear understanding of why we are doing it.
Therefore, we will start looking at the purpose of architectural work in this post.
Over the years, I distilled three different facets regarding the purpose of architectural work 1:
- The economic purpose
- The cognitive purpose
- The humane purpose
All three facets are relevant. Each one provides a different focus, looking at architectural work from a different perspective. There may be more facets I have not yet discovered, but it is important to understand that additional facets do not invalidate the existing ones. They simply add more perspectives, complementing the purpose. They do not create a different purpose but a more complete one, refining the focus it provides us.
Depending on the context, the different facets may be weighted differently. Based on what I have seen over the course of my career, the first one is always relevant when discussing with decision-makers – at least in commercial software development. The second facet is also very important; however, it is often misunderstood and thus implemented in contorted ways. Personally, I find the third one very powerful, but too often this aspect is completely overlooked, to the detriment of us all.
Let us start with the first facet.
The economic purpose of architectural work
When asked why we need to do architectural work, I sometimes say with a wink:
We need architecture to address
- an optimization problem over time
- with changing constraints
If this creates a WTF feeling, it is perfectly fine. To be honest, I do not expect anyone to understand it. It is more of a teaser. I also would not understand without additional explanation what this highly cryptic sentence is intended to mean. Thus, let me explain what I mean by this sentence.
Quality attributes
Many architects (including me) talk a lot about quality attributes. We try to express the properties of an architecture using quality attributes. To be more precise: We first try to understand the quality properties a solution must fulfill. Then we try to design an architecture that ensures the solution implements the desired quality properties. Usually, figuring out the desired quality attributes is a very early activity in any architectural work.
Quality attributes are often called “*ilities” because most of them have names that end in “-ility”, like, e.g.:
- Accessibility
- Availability
- Effectiveness
- Efficiency
- Evolvability
- Fault Tolerance
- Functionality
- Installability
- Interoperability
- Learnability
- Maintainability
- Observability
- Operability
- Portability
- Recoverability
- Reliability
- Resource utilization
- Scalability
- Security
- Time behavior
- Understandability
- Usability
- Usefulness
- …
This is a subset I often use as a tailoring basis for my projects. I compiled it from various sources like the software engineering quality standards ISO/IEC 9126 and its successor ISO/IEC 25010 as well as other sources like the quality attribute description of the SEI or the list of system quality attributes that partially also applies to software.
The list is not exhaustive, and based on your context, you may need to create a bit different list as your tailoring basis. I wrote a whole blog series about quality in software systems. Therefore, I will not dive deeper into the topic here. If you are interested in diving deeper into the idea of quality and quality attributes, you may want to read that blog series.
I mentioned the quality attributes for a different reason. If you ponder them for a bit, you can make an interesting observation. All the attributes can basically be divided into two classes:
- Attributes that influence the appearance and behavior of the solution at development time (including build and test)
- Attributes that influence the appearance and behavior of the solution at runtime which does not only include operations but also how users experience the solution
I have seen lots of discussions about software architecture going nowhere because one person had one type of attributes in mind and the other person the other type. Hence, let us take a closer look at both types.
Development time related quality attributes
Development time related quality attributes, e.g., understandability (of code), changeability, evolvability, or portability, describe the desired behavior of a system from a development perspective 2. They primarily influence how efficiently a system can be modified, i.e., they target the cost of change. The cost of change is important, as software normally needs to be modified regularly. This is a special property of software. However, it is poorly understood in general (see, e.g., my blog series “Software - It’s not what you think it is” for more details about this topic, especially the part about the value preservation dilemma).
But it is not only a lack of understanding of the properties and needs of software that makes development time related quality attributes hard to grasp. The much bigger problem is that the effectiveness of a measure taken to satisfy such an attribute can only be measured in hindsight.
To illustrate this issue, let us assume we have two options regarding an architecture decision we need to make: option A and option B, and we want to decide which option to pick regarding the cost of change. To make an accurate decision, we would need to do the following:
We need to identify all future changes to the software until its end of life first because only these changes are relevant regarding the cost of change. It is very important to understand that only the actual future changes to the software matter. We may have a feeling that an option might negatively affect the cost of change. However, as long as implementing the option does not affect any actual future change negatively, our feeling is irrelevant. It could be the ugliest hack imaginable. We might cringe just thinking of it. Still, if it does not affect any future change to the solution, it does not have any effect in terms of cost of change. 3
Then, we calculate the costs of all these changes after implementing option A and sum them up. Afterward, we do the same for option B. Finally, we compare the sums of the costs of future changes for option A and option B, and we pick the option with the lower sum, as this is the better option regarding the cost of change.
When reading this prescription, it becomes immediately clear where the problem lies. To make this kind of calculation, we would need to be able to split our timeline into two timelines. In one timeline, we would implement option A. In the other timeline, we would implement option B. Then, we would need to be able to travel forward in time until the end of the life of the software to identify all future changes to the software and calculate their costs.
While time travel and multiverses may be possible in science fiction and fantasy, they are not in our current reality. We cannot create two timelines, and we have no means to know all future changes to our software until its end of life. Therefore, we cannot accurately evaluate our two options, A and B. Instead, we need to resort to probabilities and heuristics.
This is the reason why we see so much “arcane guesswork” and so many heated discussions regarding how to create maintainable and changeable architectures. We can neither prove nor disprove whether option A is better than option B, C or D regarding the cost of change. While there are a few general heuristics that have proved to be useful based on experience, everything else is more a matter of belief than of knowing. This is a place where opinions tend to prevail and debates start. Additionally, we have a hard time proving our needs regarding the cost of change to people outside of software development.
Still, when considering the cost of change, the optimal architecture would be one that allows us to implement a software solution including all changes over its lifetime at minimal effort and cost. But as we cannot identify such an optimal architecture in a defined, deterministic fashion, we need to resort to probabilities and heuristics, accepting that our decisions – while aiming towards an optimal solution – will never result in the optimal solution. Therefore, we will apply the heuristics we know and a few other tools that help to increase the probability of making good decisions regarding the cost of change 4. But it will always be an optimization process under uncertainty.
This is the optimization problem over (the life-)time (of the solution) I talked about.
Runtime-related quality attributes
Runtime related quality attributes line, e.g., availability, scalability, security, performance, but also operability (usability from an operator’s point of view) or usability, accessibility and functional correctness (determining the usefulness of a solution from a user’s perspective) are something completely different. They describe the desired behavior of a systembased on a usage and operations perspective, i.e., they target the correctness of a solution in terms of desired properties. 5
We do not optimize anything. Either the solution meets the required quality property, or it does not. This way, runtime-related quality attributes define invariants that constrain our solution space. They support us in evaluating different options for a given problem. If the option under evaluation would fail to meet a given runtime-related quality attribute, the option is not a suitable choice. This is more of an “in” or “out”.
E.g., if we have a quality requirement that the application responds within 500ms for 99% of all requests, a solution that responds within 1s on average is not a viable option. Or if we need to ensure 99,9% uptime (less than 9h downtime per year), we cannot pick an option that requires 2h maintenance downtime per month.
These constraints tend to change over time because the runtime-related needs and expectations tend to change over time. E.g., a new user group may be added to an existing solution, resulting in different availability, scalability, performance, or security needs. Or a new legal requirement may result in changed durability demands. Etcetera.
These are the changing constraints I mentioned.
Overall cost optimization
Looking only at quality attributes, we could leave it there. Optimizing for the cost of change while not violating the runtime-related quality attributes is hard enough already.
However, there is a big caveat: Cost of change ignores a lot of cost types that are relevant for an IT system over its lifetime. Even if we were able to find an optimal architecture regarding the cost of change, it might be a bad choice regarding the overall costs that would result from that architecture.
A very simple example: I have once seen a team that intended to introduce Kafka to their solution because they expected lower implementation efforts from using Kafka. It was not about throughput, performance, or anything else you would expect if you hear “Kafka”. Those demands were negligibly low. This decision only targeted the expected cost of change. The team expected to save a few implementation days during their project, and maybe a few more later. The overall savings would have been around 50 person-days over the lifetime of the system at best, i.e., something along the lines of 20,000 to 50,000 EUR overall (depending on who implements the changes).
However, they completely ignored the license costs and the cost of operations. The company did not use Kafka at the time, and it had no reason to introduce it anytime soon. They would have needed to hire specialists to run Kafka, as their current operations team was not able to pick up the new technology. Additionally, they would have needed to pay the license cost (you do not want to run the OSS version in production as it lacks some features that are vital for running Kafka in production).
Overall, this would have added approximately EUR 500,000 per year, i.e., several million euros over the lifetime of the software. Even if the company had introduced Kafka only one year after the team’s decision anyway, it would have been a very bad decision in terms of overall costs.
Still, looking at the cost of change alone, the decision would have made sense.
Looking at this little example, we immediately see that the cost of change is not sufficient as the sole optimization goal, as it neglects all other relevant types of costs. Instead, we need to optimize for the cumulative costs over all cost types and over the lifetime of the system. Besides the cost of change, this includes:
- all other development costs
- test costs
- deployment costs
- operation costs
- maintenance costs
- license costs
- infrastructure costs
- usage costs
- administration costs
- training costs
- lost revenue
- and more, depending on your context
Unfortunately, these other cost types are not covered by the quality attributes. We have to add them ourselves to complete the picture.
This makes the optimization aspect of architectural work even harder, as we have to cover more dimensions while the quality attributes, including the knowledge of how to implement them, are of little help here. We have to add this part ourselves to our architect’s toolbox.
Another issue is that most people solely focus on immediate development costs, i.e., they only focus on the costs software development creates now. They neglect future costs as well as all other cost types. Convincing such people to consider the cost of change alone can be hard enough. Discussing a holistic cost perspective over the lifetime of a system can quickly turn into an impossible task.
Summing up
With all that explanation, we can rewrite the initial, cryptic explanation why we need architectural work in a better, understandable way:
With architecture, we attempt
- to minimize the cumulative costs of a system over its lifetime
- while ensuring its correctness of behavior at runtime
This is the economic purpose of architecture. While it may appear obvious in hindsight, I have seen very little architectural work that worked towards this purpose. Based on my experience, it takes a lot of work and experience to get there. Most of the available literature is of little help as it only focuses on quality attributes and – sometimes – the cost of change. Thus, maybe this is a good point to pause for a moment and ponder this definition and its implications for architectural work.
In the next post (link will follow), we will move on to the other two facets regarding the purpose of architectural work: the cognitive purpose and the humane purpose. Stay tuned …
-
I already wrote about the purpose of architectural work several years ago. If you compare the older blog post with this post and the next one, you will realize that my understanding of the purpose evolved over time. While the basic ideas remained stable, I discovered more facets. Personally, I find it interesting to see how my understanding evolved, as it reflects a lifelong learning process. It also adds a bit of humbleness: No matter how smart and sharp you think your reasoning is, eventually you will realize that there is always room for learning and improvement. ↩︎
-
This is probably the reason why development time related quality attributes are so often ignored by everyone but the developers, and why the developers oftentimes solely focus on them. The people who do not write code do not know anything about the needs of software developers, and many software developers only know about them. ↩︎
-
I know that every good software developer starts to wince at the thought of deliberately adding an ugly hack into the code, even if we would know for sure it would not affect the cost of change. I also do. However, regarding the actual cost of change (not the imagined one), the best option in such a situation would be to implement the cheapest option available, no matter how ugly the code would look. The good part: As we do not know upfront which parts of our code will affect future changes, the most reasonable choice based on our limited knowledge about the future is to avoid ugly hacks. ↩︎
-
As I decided to leave out the How of architectural work, I will not discuss the tools available to improve the probability of making good decisions regarding the cost of change. Maybe I will discuss them in some future post, and if I should not forget it, I will add a link here. ↩︎
-
It is important that we always precisely specify the expected system behavior. “The system is always performant” is not a specification. It is a vague wish. A specification would be something along the lines of “Up to 1,000 concurrent requests per second, the system responds in 300ms or less 99% of the time.” A specification is something we can use as input for an SLA. If we fail to specify the expected system behavior, our customers and users will implicitly make assumptions – which tend to be along the lines “The system always (100% of the time) responds instantly (within 0ms).” Therefore, we need to discuss and specify the system behavior upfront because otherwise we will need to fight an uphill battle against the customers’ and users’ implicit expectations. ↩︎

Share this post
Twitter
Reddit
LinkedIn
Email