Recently, I had two experiences within a few days that made me think regarding system dependability. In both situations, the systems acted detached from their surrounding reality and thus became confusing or even annoying – even if it would have been easy for them to detect their reality detachment.
About a decade ago, Jeffrey Dean and Luiz André Barroso published their IMO great article “The tail at scale” in the Communications of the ACM . The article dives into the topic of latency tail-tolerance.
We have discussed the business case for resilient software design in my previous post. Let us assume, you have a budget and you know which are the most critical business processes/capabilities/interactions (whatever term suits your needs best) you need to secure, i.e., make more resilient.