AI and the ironies of automation - Part 1
Some (well-known) consequences of automating work
AI and the ironies of automation - Part 1
In 1983, Lisanne Bainbridge wrote the much-noticed paper “The ironies of automation”. Being a cognitive psychologist, she discussed some counter-intuitive effects of automation in her paper. She called those effects ironies and paradoxes, providing precise definitions for both terms:
Irony: combination of circumstances, the result of which is the direct opposite of what might be expected.
Paradox: seemingly absurd though perhaps really well-founded statement.
Back in 1983, she discussed the effects of automation in the context of industrial processes, which were massively automated at that time. The paper became quite famous for putting the finger on quite a few unsolved questions that were (still) ignored in the rush towards automation back then. Today, we see another massive push towards automation using agentic AI leveraging LLMs, and it is in a similar state as the automation of industrial processes in 1983, with many relevant questions yet being unanswered.
Therefore, I thought it would be interesting to revisit it and see what Lisanne Bainbridge’s observations mean for the current agentic AI automation rush. So, let us see what observations Lisanne Bainbridge made in 1983 and what we can learn from it in the context of the current omnipresent push towards the automation of white collar work using agentic AI, usually having AI agents doing the work and some kind of human operator who is meant to monitor the work and may interfere if anything should go wrong.
While the paper in its core is a bit less than 4 pages (it is explicitly labeled as a “brief paper” at its beginning), it is a very dense one. Also, the insights are not presented in a sort of bullet-point list or highlighted sections that would allow you to briefly skim the text to grasp its most important messages. This paper wants to be read from end to end to reveal its insights. But if you do it, you will be richly rewarded because it is full of insights that have not lost any of their relevance over the time since its publication more than 4 decades ago. Most of them apply to the current AI-based automation ideas as well as to their original context.
As the paper is so rich and dense, I will split its discussion into two parts. In this blog post, we will take a look at the most important observations she made regarding the effects of automation on the humans still being “in the loop”. In the second blog post of this two-part series (link will follow), we will then look at some of the recommendations she made and what they mean for the current developments regarding agentic AI.
Setting the stage
The abstract of the paper sets the stage for its remainder:
This paper discusses the ways in which automation of industrial processes may expand rather than eliminate problems with the human operator. Some comments will be made on methods of alleviating these problems within the ‘classic’ approach of leaving the operator with responsibility for abnormal conditions, and on the potential for continued use of the human operator for on-line decision-making within human-computer collaboration.
The abstract contains a very important constraint: The observations made in the paper are related to automation scenarios where tasks are not automated 100% but where a “human in the loop” is still needed to check the results and intervene if the automation does not work as expected.
As this is the normal setting for contemporary LLM-based automation approaches, the findings of this paper apply to them, too. The current LLMs are prone to sometimes generating wrong results (up to completely made-up results usually referred to as “hallucinations” 1). Therefore, the strong recommendation is to always have a human in place who checks the results of the LLM-based automation and takes corrective measures if needed.
The unlearning dilemma
Bainbridge continues with some observations regarding the skill development of a “human in the loop”:
Several studies have shown the difference between inexperienced and experienced process operators [the experienced operators being much more efficient and effective than the inexperienced operators]. Unfortunately, physical skills deteriorate when they are not used […]. This means that a formerly experienced operator who has been monitoring an automated process may now be an inexperienced one.
This observation is about the well-known fact that you regularly need to apply your skills to keep them sharp – no matter if they are physical or mental skills. If you only apply them once in a while, they deteriorate over time. We all had that experience ourselves:
- Once we were proficient in something. We did it regularly. It felt easy. It felt smooth.
- Then we needed to do it only once in a while, with long time gaps between doing it.
- It did not feel easy and smooth anymore. It felt increasingly hard and cumbersome.
- Eventually, we felt not actually like beginners, but we realized we had lost much of our proficiency.
E.g., I do not find the time to code (no matter if with or without AI) as often as I would like to because there are always many things on my plate that beg for my attention. If I find the time to do some coding, I find myself looking up a lot of things I simply knew by heart in the past. I remember that I once knew it, but I do not remember anymore how to do it exactly. While my overall coding experience is still of great help, it takes me a lot longer to get a given coding task done today than back in the time when I was coding on a daily basis.
The same problem applies to someone who was an experienced expert in a given domain but is then reduced to monitoring some kind of AI solution that is meant to do their prior work. The experience atrophies. While the background knowledge is still there, it takes them longer and longer to get actual work done – to a point where they basically start from scratch depending on the task.
At the moment, this skill deterioration is not yet visible. Everyone who uses agentic AI solutions just started using them a few months ago. Usually, those people also do not work exclusively with the AI-based solutions but still do a lot of work on their own. However, human skills will deteriorate if they leave the actual working to the agentic AI solutions and move into a pure overseer role most of the time. Eventually, the former experts will become beginners who once were experts.
The recall dilemma
Lisanne Bainbridge dives deeper into this issue. Her next observation is:
[…] efficient retrieval of knowledge from long-term memory depends on frequency of use (consider any subject which you passed an examination in at school and have not thought about since).
This adds to the prior observation. The skills do not deteriorate only. It also takes longer to retrieve any kind of information from the long-term memory if it is used only rarely.
It needs regular practice to keep the skills sharp
Bainbridge then continues:
[…] this type of knowledge develops only through use and feedback about its effectiveness. People given this knowledge in theoretical classroom instruction without appropriate practical exercises will probably not understand much of it, as it will not be within a framework which makes it meaningful, and they will not remember much of it as it will not be associated with retrieval strategies which are integrated with the rest of the task.
This means that just sending someone to a (theoretical) training before (in our context) putting them in charge of controlling an agentic AI solution is not of much help because the relevant knowledge and expertise is only built by using it regularly in the live context. But in the live context, they cannot apply their knowledge and sharpen their expertise because the AI agents are doing the work.
The next-generation dilemma
This immediately leads to the next statement:
There is some concern that the present generation of automated systems, which are monitored by former manual operators, are riding on their skills, which later generations of operators cannot be expected to have.
I find this statement particularly interesting because it nicely sums up one of the big dilemmas of the current, often short-sighted move towards “everything AI”:
The people currently forced to become human operators for AI solutions usually built the required knowledge to monitor the AI solutions and to intervene if needed in their working past. Even if their expertise and knowledge access deteriorate over time, they have built the required knowledge to do their job at least for a while (until their skills will probably deteriorate beyond a point where they cannot fulfill their jobs anymore).
But people in the future who will not have built the required knowledge by doing the work themselves will neither have the knowledge and expertise to do the work nor will they have the opportunity to build the required knowledge.
This way, the required knowledge and expertise needed to monitor the LLM-based solutions and to intervene if needed will vanish over time, and nobody will be left who can do the job.
Of course, we know that in such situations, solutions will emerge. However, very often they are more knee-jerk, less effective and also less ethical than a solution considered and designed properly from the beginning.
One possible solution would be that the quality of the AI solutions improves so much that a human in the loop is no longer required. This is what almost certainly every AI investor and AI solution provider would tell you, that AI solutions will improve so much over the next few years that humans will no longer be needed to oversee them. However, even if some of those people are really smart, they tend to focus their smartness primarily on making as much money as possible. And you, ignoring the issues that Lisanne Bainbridge pointed out, is their way into making as much money as possible. Thus, to be frank, I do not put too much stock in what these people say – too much self-interest involved.
Additionally, due to their functioning principle, it is very unlikely that LLM-based solutions will ever work error-free. Thus, going for unsupervised agentic AI based on LLMs will be limited to situations where a certain error rate is acceptable. While this may be okay, e.g., for market research, most likely it will not be okay for, e.g., software development because the created software solutions are expected to run reliably in production. 2
Another possible solution would be that a new job profile of “AI fixer” will emerge, i.e., people who build and hone their skills by doing work on their own that is usually done by an AI solution, and who are then brought in when an AI solution failed and is not able to fix the error it made. We already see the first “AI fixers” springing into existence.
Or a major AI breakthrough will be made in the next few years that leads to powerful and reliable AI solutions which replaces LLMs as the powerhouses of AI. Again, AI investors and AI solution providers, often calling themselves “techno-optimists” will very likely tell you that this is going to happen. But then again: Too much self-interest involved and thus not credible.
The actual AI experts will tell you that they do not know when we will discover the next leap forward in AI and what it will be. Thus, believing that the next major AI breakthrough will happen and be widespread applicable in the industry before the humans in the loop are no longer capable of doing their job is but wishful thinking at the moment.
But no matter how the solution will look like, it is already clear that the currently recommended naïve approach of simply turning subject matter experts into human operators who monitor the AI solution and intervene if something goes wrong is not a sustainable one.
Monitoring fatigue
The next observation Lisanne Bainbridge makes is also particularly interesting:
We know from many ‘vigilance’ studies (Mackworth, 1950) that it is impossible for even a highly motivated human being to maintain effective visual attention towards a source of information on which very little happens, for more than about half an hour. This means that it is humanly impossible to carry out the basic function of monitoring for unlikely abnormalities, which therefore has to be done by an automatic alarm system connected to sound signals.
Humans are not able to stay vigilant if little happens at the target of their vigilance, which also includes that everything works as expected most of the time. Side note: If humans were able to stay vigilant for a long period, humanity almost certainly would not exist anymore because this kind of “built-in inattentiveness” regarding targets where little happens is a human trait that ensured our survival as a species in the past.
Most AI-based solutions work correctly most of the time – at least if they have seen enough training data with respect to the task given and the task given to an AI agent is not too big and accurately defined. Sometimes they may produce a minor error, sometimes a bigger one – usually disguised in an overly self-confident presentation which makes it harder to spot the errors, i.e., the impression that everything works nicely may last even if it does not.
Over time, it can be expected that AI-based solutions will be refined, i.e., that they will produce fewer errors. However, due to the functioning principle of LLMs, this error rate is unlikely to drop to zero, i.e., errors will still be produced but less often.
If the task of a human operator is to spot errors and intervene if one happens, a system that rarely produces errors is a system on which very little happens from the perspective of the human operator. This means the human operator is not able to stay vigilant. Even if they have the task of detecting errors, some of the errors will evade them – because they are human.
The usual countermeasures against monitoring fatigue do not work
Trying to “motivate” the human operators by punishing them for errors that evaded them is punishing them for being human. It will not change anything except that the human operator will either suffer a burnout or resign – whatever happens first.
Also, other, less inhuman ways of trying to deal with monitoring fatigue, like adding an automated alarm system, are doomed to fail, as Lisanne Bainbridge noted:
This raises the question of who notices when the alarm system is not working properly. Again, the operator will not monitor the automatics effectively if they have been operating acceptably for a long period.
While adding an automated error detection and alarm system may reduce the error rate further to a certain degree, it is almost certain that a malfunction of the error detection system will go unnoticed and thus the underlying errors will slip through.
A classic method of enforcing operator attention to a steady-state system is to require him to make a log. Unfortunately people can write down numbers without noticing what they are.
Other methods of keeping the human operator vigilant do not work either. If little happens with respect to the task given, our attention will drop after a short period. It is better to accept this fact of human nature because everything else is unrealistic and will certainly lead to problems – in the worst case, to catastrophic ones.
The status issue
Another rarely considered, yet in practice highly relevant aspect is:
The level of skill that a worker has is also a major aspect of his status, both within and outside the working community. If the job is ‘deskilled’ by being reduced to monitoring, this is difficult for the individuals involved to come to terms with.
If people are reduced from being a subject matter expert to a mere chaperone for the AI solution, they lose status – in their self-perception as well as in the perception of their colleagues.
The paper then states that the people affected by this deskilling respond in various, often seemingly paradox ways to this situation. It also makes some additional interesting observations, which I will leave out here because the observations are a bit more subtle and intricate, and it would take too long to present them here without almost citing the whole paper. Thus, I can only recommend reading the whole paper, including the observations left out, because most of them are also relevant for the current agentic AI automation situation.
The expert as observer paradox
Lisanne Bainbridge concludes her observations with the following summary before moving on to a set of ideas and recommendations of how to approach the problems described:
One might state these problems as a paradox, that by automating the process the human operator is given a task which is only possible for someone who is in on-line control.
I think, this sentence nicely sums up the core issue: You can only monitor an AI solution properly and intervene in case of an error if you do the exact work all day long the AI solution is doing – which you cannot anymore because the AI solution is now doing the work and you are expected to only supervise it.
At the moment, the problem is not apparent yet because people are just in the process of being degraded to AI chaperons, i.e., until now they did the job on their own. The deskilling issue only becomes apparent after some time. The problem with this delayed deskilling is that when it becomes apparent, it might be too late to take effective countermeasures.
Interlude
Up to this point, we are only 1.5 pages into the paper, covering its abstract and introduction. As I wrote in the beginning: This paper is dense! Because this post is long enough already, I will leave it here and give you some time to let Lisanne Bainbridge’s observations sink in and ponder what they mean for the current push towards automation based on agentic AI solutions.
In the second post of this little series (link will follow), we will then look at the recommendations Lisanne Bainbridge made in her paper and what they mean for the current AI developments. Stay tuned …
-
Personally, I dislike the term “hallucination” and all other terms that anthropomorphize AI solutions based on LLMs. LLMs are not human or human-like. Thus, terms like “hallucinate”, “reason” and the like do not apply to them. While their capabilities are sometimes impressive, their intelligence – if they should have one – is very different from the intelligence of humans. Applying terms of human behavior to LLMs obscures an important distinction, which is needed to use LLMs correctly. ↩︎
-
Sure, human software developers also make errors, and some of them get shipped to production. Nevertheless, you would introduce a hard to quantify risk if you would automate such an approach, not knowing if the agentic AI solution would be able to eliminate the errors it did not detect before the software went to production. Again, LLMs are not “smart” in the way a human is, and we still have no real idea what they are able to do reliably and where their limits are that they cannot cross based and their functional principle. ↩︎

Share this post
Twitter
Google+
Facebook
Reddit
LinkedIn
StumbleUpon
Pinterest
Email