The cloud-ready fallacy
Lately, I had a discussion with an on-premises data center manager who claimed that public cloud does not leverage any cost advantages over on-premises computing. His numbers would even prove that public cloud is more expensive than the on-premises operations they offer.
His conclusion was that there is no need to use public cloud offerings at all. It would only make things more complicated, risky and expensive due to:
- Vendor lock-in
- Compliance issues
- Figuring out and setting up all the tooling, processes and regulations anew that are well-known and proven for on-premises operations
- Expensive training for all the staff in development and operations
- Unpredictable usage costs
He finished that they already provide OpenShift and Kafka in their on-premises data center. So, everything needed for cloud computing would be already in place in their private cloud – without all the imponderabilities, risks and financial disadvantages of the public cloud.
It was not the first discussion of this type, I had. I do not care so much about the motivations that particular person might have had for his claims. But I was rather surprised that such a discussion still takes place in 2021:
- Why do we still hear such arguments?
- Why are holders of such positions often still successful in their companies?
- Where is the fundamental flaw that allows such reasonings to be successful?
That made me think.
A bit later it struck me: The problem lies in the idea of being “cloud-ready”.
The company, that person was working for, had a “cloud-ready” strategy regarding cloud computing. This is a strategy that can be seen in a lot of companies that start to adopt cloud computing.
The idea basically is to build applications in a way that while running on-premises they could also be deployed to a public cloud environment. The reasoning behind that approach is that it would allow a smooth, low-risk transition to using public cloud services.
Additionally, it would leave a way back to a safe ground in case that some unexpected problem would occur in the public cloud. Finally, it would mean remaining independent from the actual public cloud provider and thus could avoid the risks of vendor lock-in, i.e, if a provider would play the lock-in card they could easily move to another provider.
On a technical level, this strategy means that all applications are limited to the infrastructure services that are also available in the on-premises data center:
- Applications typically must be able to run in virtual machines (VM) or containers.
- A few storage technologies are provided, typically some relational database and a file system, sometimes additionally some NoSQL database, seldom object storages or alike.
- A few communication types are supported, e.g., REST plus events or messaging supported by a tool like Kafka or RabbitMQ.
- Typically, the infrastructure services are not accessible via API and self-service is also not available. Instead, existing deployment procedures must be followed.
That is the usual implementation of “cloud-ready” – give or take a bit.
“Cloud native” as an accelerant
Note that this almost the same definition the Cloud Native Computing Foundation gives for “cloud native”:
Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach. 1
This definition focuses on containers instead of VMs which more or less naturally leads to the mentioned immutable infrastructure. And it favors microservices over monolithic applications which more or less naturally leads to the declarative APIs of their definition. 2
But besides those details it basically describes the same concept as “cloud-ready” and thus is often used by “cloud-ready” advocates to “proof” that they are right. Their approach is “100% cloud native” after all, isn’t it. 3
Fear of “vendor lock-in”
As mentioned before, you also find very similar definitions whenever companies are afraid of a cloud vendor lock-in – which a surprising lot of companies are: To avoid any possible lock-in, you are only allowed to use technologies that are provided by all cloud vendors and the on-premises data center.
This typically leaves you with compute via VMs and containers, storage via RDBMS and file systems (plus an occasional NoSQL database) and network that is basically predefined by operations.
Sometimes this even means that you cannot use the Kubernetes or database services that all bigger public cloud providers offer. Instead you need to install it on your own on top of cloud VMs to be “truly vendor-neutral”.
Wasting time, money and competitive advantage
But no matter if you go with “cloud-ready”, the droopy “cloud native” definition from the CNCF or if you are afraid of cloud vendor lock-in: All variants do not make any economic sense because you deprive yourself from basically all economic advantages of (public) cloud computing, e.g.:
- Managed services – You cannot leverage the power of the multitude of available managed services, ranging from infrastructure components up to full business suites, ML/AI and more that help you to reduce the vertical integration depth massively. This means that you have to implement, test, install, configure, harden, operate, update, etc. the services that your competitors simply consume. It also means that development teams often need to take the time to implement non-differentiating functionality instead of focusing on differentiating functionality. In a competitive market, this can make the difference between winner and loser.
- Elastic scaling – You cannot dynamically scale up and down on demand. You always need to provide the resources for peak utilization – also in the public cloud context because you are not allowed to use vendor-specific services for elastic up- and downscaling as they are not available in the on-premises data centers.
- FaaS and pay-per-use – You cannot leverage the cost advantages of cloud functions for applications that have sparse access patterns. Take, e.g., a small new offering that is accessed only about 100 times a day. With functions you can often limit such a use case to a few seconds runtime per day, costing a few cents per month. With VMs or containers at least one VM or container needs to be up and running 24h per day, resulting in orders of magnitude higher costs.
- Spot instances – You cannot leverage the economic advantages of spot instances. E.g., if you design your number crunching job in a way that you can stop any processing node within a few seconds and resume processing on another node as soon as capacity is available again, you can reduce your costs by up to up to 80%-90% using spot instances. Some of the big data frameworks of the cloud providers have this option already built-in (e.g., the MR frameworks).
There are a lot of economic advantages in terms of immediate costs and cost of delay that you can leverage only if you make use of the capabilities the public cloud providers offer you. If you are not willing to use these capabilities, if you limit yourself to “cloud-ready” or alike, you will gain nothing. In the worst case, moving to the cloud will just be more expensive and less reliable than sticking to on-premises computing.
“Cloud-ready” is not an economically viable strategy.
Or from a different perspective:
“Cloud-ready” means emulating your on-premises data center (and your operations processes) wherever you are.
Doing it better
This does not mean that you should move all your workloads immediately into the cloud. There might be good reasons for not putting everything in the cloud, e.g.:
- Running your existing monolithic (standard) software in the cloud does not give you any economic advantage.
- Some of your workloads cannot leverage any of the economic opportunities of the cloud.
- You do not have the required skills to build and run cloud applications in your IT department.
- You still shirk to do the required compliance work.
This is okay.
But do not decide for a half-hearted “cloud-ready” strategy.
Instead better go for a clearly differentiating strategy:
- Assess and decide which applications you want to run on-premises and which you want to run in the cloud.
- Then go all-in for the different parts to leverage the respective platform advantages – be it cloud or on-premises.
But decide for each application explicitly which platform it should run on. Do not try to implement “cloud-ready” applications. Such an approach leaves you stuck in the middle, making your on-premises operations more complex than needed while not being able to leverage the economic advantages of the public cloud. From an economic perspective this is probably the worst place to be.
Before wrapping up, let me briefly discuss to widespread fallacies that are often used by the “cloud-ready” advocates:
- The “vendor lock-in” fallacy
- The “container equals cloud fallacy”
The lock-in fallacy
A common argument of the “cloud-ready” advocates is the lock-in argument: If we use all the offerings from a single cloud vendor, we will have a vendor lock-in.
Leaving aside that it is rather hilarious to get that argument most of the times from companies who still happily throw big parts of their IT budgets after the IBMs, Oracles and SAPs of this world for quite flimsy economic reasons (if there is any besides the usual “You do not get fired for buying …”), the argument itself is also insufficient.
I already discussed the lock-in fallacy in detail in the context of Open Source Software. For a detailed discussion please read the referenced post.
Here I will only repeat a few core statements:
From an economic perspective, lock-in usually is something useful. The product 4 you decide for relieves you from doing something yourself. You can use the freed capacities for more valuable work. This way you can create an economic advantage.
Of course you need to calculate if the expected benefit actually exceeds the costs. But I assume that you always do that at least roughly before you make a make vs. buy vs. lease decision. I also assume that you know where the product in question is located on a Wardley map, if it still offers any competitive advantage to build and/or operate it on your own. 5
The only critical point is if you need to stop using the product for any reason. Then you have to put efforts in switching to a different product. Additionally, you have a risk if the time you have to switch is shorter than the time you need to do so.
So, a simple economic calculation would be 6:
- Create a baseline by determining time, money and risks of building, running and maintaining the solution in your own data center (keep in mind that also OSS does not come for free).
- Calculate time and money it takes to implement and run the solution leveraging a suitable public cloud infrastructure.
- Calculate time and money an exit would cost, i.e., moving away from the cloud provider to a different one. Also determine potential particular exit risks involved.
- Determine the duration you expect to be able to leverage the cloud provider’s offerings (how long can you harvest the economic advantage of the high level cloud provider offerings before you need to trigger the exit strategy).
- Calculate the on-premises costs over the duration.
- Calculate the cloud solution costs over the duration. Add the exit costs. Potentially add a premium for specific exit risks.
- Compare on-premises and cloud costs (including cloud exit costs) over the duration.
If the cumulated cloud costs are below the cumulated on-premises costs, go for it. Otherwise, think again. Either your use case is not suitable for public cloud or your solution architecture does not leverage the economic options of public cloud computing. But now you have a basis to make an educated decision.
Unfortunately, this calculation is almost never done. Most of the times gut feeling and fragmentary, superficial knowledge are used as basis for decision making.
Summing up, “vendor lock-in” is neither good nor bad. Every decision you make means a “lock-in”, even decisions for OSS solutions. In the end, “lock-in” is nothing but another economic option you have, exhibiting advantages as well as disadvantages. If you know how to assess your options properly, this can give you a significant economic and competitive advantage.
For a cloud context this means to calculate the economic advantage per year if you leverage all the possibilities the cloud vendors offer you. Depending on the use case this can be up to 90%+ less costs than running the same use case on-premises.
In terms of cost of delay the difference can be even a lot bigger, up to winner or loser of a market.
But in the case of cloud computing you consequentially need to leverage the options the respective public cloud provider offers you. Otherwise, you will not be able to leverage any economic advantage.
The “container equals cloud” fallacy
Another common argument of the “cloud-ready” advocates is that they would offer container schedulers like Kubernetes in their on-premises data centers and thus would already enable perfect cloud computing.
To bust that myth: Cloud native has nothing to do with containers, even if the dispirited definition of the CNCF might suggest so.
Cloud computing and containers are two completely orthogonal concepts. By installing a container scheduler in your on-premises data center you do not get elastic scaling. You do not get pay per use. Often you do not even get self-service. In other words: You do not get cloud computing (as the aforementioned properties are the defining properties of cloud computing).
This does not mean that containers would be irrelevant. Quite the opposite. If you still have not adopted containers and are still stuck on VMs, you are missing one of the most important evolutions of IT in the last decade. I will probably discuss in a future blog post why containers are so relevant for computing.
But still, while you definitely should adopt containers, it has nothing to do with cloud computing.
“Cloud-ready” and its variants are not a viable strategy from an economic perspective. It makes on-premises development and operations more complex while not being able to leverage the economic advantages, a cloud platform offers.
Thus, instead of going for a generic “cloud-ready” strategy, better decide the target platform for each application explicitly based on clear economic considerations and then consequently use the options of the respective platform – be it cloud or on-premises.
As always, I hope this post gave you some ideas to ponder, especially how to make better technology decisions from an economical perspective.
See https://github.com/cncf/toc/blob/master/DEFINITION.md (retrieved January 2021) ↩︎
I am not sure why the CNCF came up with such a droopy definition that deliberately excludes all actual advantages of (public) cloud computing. My personal guess is that the definition was the result of arduous discussions in such a big consortium consisting of several hundred companies, ending up with the least common denominator of the interests of all the parties involved. And as some of the CNCF platinum members make most of their money with selling “private cloud” products … ↩︎
Here, “product” is meant in its most general definition. It can describe a programming language, a library or framework, an application, an infrastructure component, etc. ↩︎
Surprisingly often, you neither find the economic nor the strategic homework done. Instead, you only find preconceived opinions, defend obstinately. ↩︎