In search of resilience
Why we need resilience more than ever
In search of resilience
AI is eating IT. At least this is how it looks at the moment. AI everywhere, in all stages of software development as well as part of the solutions, including SaaS and COTS. The promise and widespread AI application pattern:
- Make developers hyper-productive using AI to fix existing issues in IT, speed it up and reduce costs
- Automate business processes using AI to fix existing issues in business (including demographic change), speed them up and reduce costs
- As all issues and bottlenecks will be fixed thanks to AI, business can be scaled further
In short: AI will help to produce more, faster, and cheaper.
While this may sound like the wet dream of a shareholder, investor, or stock trader, satisfying the capitalistic dogma of “growth at any price”, the associated price tag is massive.
AI as an amplifier of the status quo
The problem is that, in the end, AI serves as an amplifier of a company’s status quo, amplifying its good practices as much as its bad practices. This is very crucial to understand:
AI is not a magical fix-all. It is an amplifier.
If you are organized as a healthy, post-industrial company, working in a flexible, value-oriented way, AI can be a huge productivity and value booster. However, if you are organized as an industrial company, trapped in a 20th century company culture, focused solely on output and scale, and believe in magical AI silver bullets, then AI will reinforce this attitude including the outdated company culture (see my post “The evolution of markets” for an explanation of industrial and post-industrial markets, and my post series “The different flavors of IT” for an introduction in different properties of IT organizations required to adapt to the different market types).
While AI in an industrial company (which are still most companies after they got past their infancy – old habits dies hard, even if you are not an old company) may increase output and maybe even profits in the short run, it does not prepare the company for anything outside its current status quo, let alone adverse surprises of any kind. In other words: Such an attitude very likely compromises the mid-term and long-term viability of the company.
Uncertainty everywhere
Why is it this way? Why is it dangerous solely to focus on growth, assuming mostly unchanging conditions? The short (and a bit pithy) answer:
Output-oriented companies assume certainty, but certainty is an illusion.
We are surrounded by uncertainty. Thus, the only viable approach is to navigate uncertainty.
The longer answer: Most markets have become post-industrial, i.e. highly dynamic and thus uncertain from the perspective of a company offering goods or services. Customers do not buy anymore just because a company offers a product at a reasonably low price at scale. Customers only buy if their individual needs and demands are satisfied, and as supply exceeds demand by far, customers have lots of options to choose from. 1
This means quick and flexible adaptation to changing market demands is needed for long-term viability.
Significant geopolitical and geo-economic shifts create an additional layer of uncertainty. Political and economic allies change. Long-time partners vanish, new players enter the playing field, and it is more uncertain than ever how things will further evolve. Trusting that the networks will remain as they are now is not a sensible bet anymore.
This means quick and flexible adaptation to changing geopolitical and geo-economic alliances (including leaving markets and entering new ones with different needs and demands) is also needed for long-term viability.
At the same time, IT has become indispensable due to the effects of the digital transformation (see, e.g., my post “The three waves of digital transformation” for an introduction in digital transformation and my post series “Responsible IT” that explains the consequences of digital transformation in more detail and puts it in perspective with some other relevant topics). It is no longer possible to do anything relevant in a company without using IT. As a consequence, it is also no longer possible to change anything in a company without touching IT.
This means IT has become an essential ingredient regarding a company’s viability. It must run reliably, or the company is in deep trouble. It also needs to be changeable quickly and flexibly, or the company cannot respond to the increasing uncertainty regarding the further evolution of the markets and the world.
Fostering viability
If I only focus on the more-faster-cheaper narrative that dominates industrial thinking, I am not able to navigate uncertainty confidently. Instead, I drive efficiency which fosters rigidity and impedes the ability to adapt (I discussed this tension in more detail in the 6th part of my “The long and winding road towards resilience” blog series). This is okay if I play a finite game where I optimize for a given end-of-game situation. If I plan to close my company, e.g., at the end of the year, it is fine to optimize for maximum profit within this limited time period.
However, as we know from gaming, there is a big difference between finite games with a defined end condition and endless games 2. A very important part of the endless game is that we balance short-term success and long-term viability. We do not try to maximize only our current profit. Instead, we also look at the needs of tomorrow.
In the world of today, with its increasing uncertainty, this especially means investing in resilience because our long-term success and viability are defined by our ability to handle adverse surprises well. It does not help to grow as fast and cheap as possible if the next unexpected event – which will happen sooner than later – lets us capsize because we were so focused on our growth and cost-efficiency that we do not know how to handle this changed situation, and our organization is so over-optimized for the previous status quo that it neither knows nor has the capability to adapt quickly enough. 3
A lack of resilience
Before talking about resilience, we first must clarify what the word actually means. The problem is that most people in IT think that resilience is solely about (cyber-)security. I do not know if the reason for this widespread misinterpretation is that the vendors misused the term long enough or that nobody of all the people misusing the term can imagine any kind of threat besides cyberattacks. But no matter what the reason for the misinterpretation of the term is, it poses a problem because resilience is so much more than just (cyber-)security.
In my blog series “The long and winding road towards resilience”, I distilled the following definition of resilience from more than two dozen other definitions across many domains (see the post for more details):
Resilience is the ability to cope successfully with adverse events and situations, including
- handling expected adverse events and situations (robustness)
- handling unexpected adverse events and situations (coping with surprise)
- improving due to adverse events and situations (anti-fragility)
Even if we ignore everything that follows the comma in the first line, it becomes clear that resilience is a lot more than just being able to handle cyberattacks. It is about all types of threats: business-related, technology-related, and human-related. If your strongest competitor launches a superior product, or if your top talent leaves your company, it may be an even bigger threat than a cyberattack.
The point is that, especially in a highly uncertain environment where we do not know where the next blow will come from, we need to foster the ability to handle any kind of expected and unexpected adverse event to preserve our future viability. This has many more dimensions than most people think about. The three most relevant dimensions are:
- Threat types – the kind of blow, e.g., an adverse political development, a supplier going out of business, a disrupted supply chain, a cyberattack, an involuntary data exposure, a competitor launching a highly attractive offering, the system going slow annoying customers, an uncaught cascading failure in the system, a human operator involuntarily setting a wrong value that brings the system down, overloaded key players that burn out, a critical software bug that slips into production, a new feature release that annoys the customers, a shift in the domain ecosystem, a hostile takeover attempt, etcetera. In short: It can be anything that poses a threat to the company.
- Threat target – the target of the threat, at the highest level
- Technology – the IT system landscape
- Organization – everyone who and everything that is affected, including the processes
- Humans – very important and very powerful when it comes to threat prevention and mitigation, yet too often neglected
- Threat response – the response to the threat, one of the following (see “The long and winding road towards resilience - Part 8” for more details)
- Withstand – Having enough capacity and resources to resist the threat
- Recover – Being knocked over by the threat but having the ability to recover very quickly
- Adapt – Continuously adapting threat responses to a changing threat surface
- Transform – Changing radically. Only applied if adaptation is not enough to remain in a viable area
If we look at these three dimensions, we quickly realize that most companies are not very good when it comes to resilience. Most threat types are either poorly covered or not covered at all. Frantic activity only starts after the problem has occurred. Nobody is prepared for it. Finger-pointing starts. Nobody knows what to do. People try to remain invisible as long as the blame game is ongoing, and the best way to do so is to lower their heads and stick to the processes and rules.
Hence, everyone applies their default activity patterns because that is what the people know, what the organization and processes allow, and what keeps them invisible for a while. However, following these patterns made it possible that the threat blow landed with full force, nobody being prepared. Thus, the usual activity patterns will not help. Additionally, as nobody is prepared for the situation, ad hoc measures are applied, more often than not causing a lot of collateral damage.
Looking at the threat targets, we realize that usually only a small fraction of them are covered. There are some measures regarding (cyber-)security focused on part of the IT systems and a few processes. In some more regulated domains, we also see a bit of supply chain risk management, which in the worst case boils down to some checklist fuss. But more often than not, this is it. The majority of the IT system landscape is only poorly covered or not covered at all. The organization is driven by politics instead of resilience. The processes are driven by compliance, legal and controlling instead of resilience 4. And the humans are forgotten. I mean, what can you expect from a company that calls its employees “resources” as in “human resources”?
If we look at the response types, the only response type that is usually considered is withstanding. However, it is inhibitory expensive to build up the capacity and resources needed to withstand all kinds of expected and unexpected events, and therefore even withstanding is implemented only in a few places. The other response types are usually not considered at all.
Overall, in most companies, resilience and thus long-term viability are poorly covered.
Detrimental factors
The question left is why companies are so poor regarding resilience. As I mentioned before, efficiency and resilience are sort of antagonists, at least if we take them to the extreme. You cannot optimize efficiency infinitely without compromising resilience, and vice versa. If we want resilience – which is needed to deal successfully with the surprises of tomorrow – we need to balance resilience and efficiency to a certain degree. This does not mean we need to give up efficiency completely. However, we cannot optimize it infinitely.
This is where the catch comes in. Shareholders and investors only incentivize growth and short-term profits, which in turn incentivizes cutting costs, i.e., focusing on cost-efficiency. Everything else is outside the perception of most shareholders and investors. 5
Erik Markowitz put it nicely in his post “It was never about AI (We are not our tools)”:
“We built systems that prize speed above all else, and in doing so we lost the most fundamental lesson that nature teaches: speed of growth makes you fragile. The tree that shoots up fastest is the first to fall in a storm.”
He continues:
“There is not a single venture capitalist on Earth who would fund a redwood or a sequoia. Too slow. Not scalable. We have severed ourselves from the wisdom of nature, and we built a financialized economy in its absence. It’s an economy that optimizes for the quarter, not the century. An economy that treats speed as virtue and patience as weakness. An economy that looks at a forest and sees lumber, not a blueprint for resilience.”
I particularly like the last sentence: Seeing lumber instead of a blueprint for resilience when looking at a forest. If I only care about exploitation in the present, there will not be anything left in the future. Especially, I will lack the capacity needed to deal with unexpected (and often even expected) adverse situations. As I have optimized for growing as quickly as possible at the least cost possible, any light wind will knock me over.
AI works as a fire accelerant in this setting. Shareholders and investors push for even more growth at even lower costs “because AI”. Resilience becomes even worse. The companies are expected to produce more, faster, and cheaper with AI. Get rid of people. Get rid of safeguards. Just more, faster, cheaper.
If this were not bad enough, AI additionally has a very dangerous effect on the people who use it a lot. As Steve Yegge, one of the AI poster boys, recently pointed out in his highly recommendable blog post “The AI vampire”:
“I first started noticing a concerning new phenomenon a month ago, just after the new year, where people were overworking due to AI.
This week I’m suddenly seeing a bunch of articles about it.
I’ve collected a number of data points, and I have a theory. My belief is that this all has a very simple explanation: AI is starting to kill us all, Colin Robinson style.
If you’ll recall from What We Do In The Shadows (worth a watch, yo), Colin Robinson was an Energy Vampire. Being in the same room with him would drain people.
That’s … pretty much what’s happening. Being in the same room with AI is draining people.”
Basically, the renowned Havard Business Review made the same observation in their article “AI doesn’t reduce work – it intensifies it” (behind a paywall, sorry, but the title already provides a nice summary, and you may find other places on the web discussing the article).
Without going into the details of the post and the article (please read them for further details, both highly recommendable), we can observe that AI – especially agentic AI – (involuntarily) attacks the well-being of the humans who use it, too. Humans and their ability to come up with (genuinely) creative solutions in the face of unexpected situations are basically the only reason why many companies still exist, even if they only optimize for growth and cost-efficiency. If we exhaust exactly this last line of defense regarding unexpected adverse events, we create a bigger threat to a company’s future viability than we may imagine.
The need for resilience
To sum up: We need more resilience. Steve Yegge points out a few ideas in his aforementioned post on how we may deal with human exhaustion (as I wrote: read it). While this is a good start, it is by far not enough.
I just attended a big C-level strategy event, and I was baffled again how naturally most people in the building said they expected agentic AI to improve their growth, cost efficiency, flexibility and resilience at the same time without realizing that the two property pairs are antagonists. My conclusion was:
We are in dire need of more resilience.
I will not talk about concrete resilience measures and implementation details in this post. Rather, this post is meant to be sort of a wake-up call, you may point your boss to if you like. I have already written quite a bit about resilience on this blog (just search for the term “resilience”) and talked a lot about it at conferences (see my slide decks for more information). And I will probably write quite a bit more about resilience on this blog. 6
But for now, let me close with a call to action:
Let us talk more about (actual) resilience. We need it more urgently than ever in the age of agentic AI.
-
Some companies still claim that they know the demands of their customers and are able to anticipate them very well. This may be true in quite small niches that those companies dominate, or if the company works in a highly regulated market that limits competition and thus the choices of the customers. In most other places, I think this is more of a self-reassurance while incrementally losing ground to faster competitors that adapt quicker to changing customer needs and demands. ↩︎
-
Writing an in-depth blog post about the difference between finite games, which have a defined end condition, and endless games is still on my to-do list. This is a very important topic because running a company is basically an endless game, i.e., you want to optimize the well-being of the company over a long time, not just the next 12 or 24 months. However, especially in the last decades, very often companies are run as if they were games with regularly updated end conditions (quarterly numbers, fiscal year, etc. when the analysts and the investors expect to be pleased) which is a reason why many companies in general and IT departments in particular are in such a bad shape. ↩︎
-
Side note: It is funny enough that IT leaders often talk about growth, cost-efficiency, flexibility, and resilience in the same sentence, without realizing (or ignoring) the mutual tension between these properties. ↩︎
-
I know that in theory, compliance, legal, and controlling could improve resilience. However, in practice, they rarely do. More often, they do the opposite by making flexible responses to unexpected events harder or even impossible. ↩︎
-
The game of shareholders and investors tends to be a game with a defined end condition – at least if they are institutional parties. The shareholders’ game is: “Let me make as much money as possible with this stock now and get out as soon as something else is more profitable.” The investors’ game is: “Let me get as high a return on investment as possible by either selling the company to another (bigger) company that can afford to pay the money or by taking it public in as short a time as possible.” Both describe a finite game, a game with a clear end condition – from their perspective. Additionally, it is only about return on investment, not about what the company does. Therefore, from their perspective, they neither know what would prepare a company for the endless game nor do they care. ↩︎
-
If you want me to make your company more resilient, let us talk. However, even if I am happy if I can support people and companies becoming more resilient, self-marketing is not the primary goal of this post. This is why I put it into a footnote … ;) ↩︎

Share this post
Twitter
Reddit
LinkedIn
Email