Reading minds
How the increased role technology plays in our lives throws a spotlight on how the mind works
That humans have endless capabilities for understanding each other in nuanced ways is well understood. We are all aware of the subtle eye roll or glance away that can communicate so much. The smallest of differences in our gestures carries a myriad of meanings that we can understand in a moment.
The ability we have to attribute mental states to others is called Theory of Mind (ToM). Researcher Bertram Malle, asks us to imagine the following:
“You observe two people’s movements, one behind a large wooden object, the other reaching behind him and then holding a thin object in front of the other.”
We need ToM to understand that this simply describes a customer pulling out their credit card with the intention to pay the cashier. If we did not have ToM, we would not understand the sequence of movements or be able to predict either person’s likely responses that we simply take for granted. But with the capacity to understand physical movements through mental states, we can interpret this complex scene into the intentional acts of offering and trading.
We understand how to navigate a wide range of situations and, in a way, ‘read each other’s minds’ but how do we manage this when one of the participants is a robot? In a world where we may be cared for in our old age by a robot or drive on a road alongside robot-controlled cars, this is a question that is very much a live issue.
Reading the mind of a robot
Whether robots can be programmed to have a Theory of Mind about humans is addressed by AI researcher, Richard Sutton, in his essay titled “The Bitter Lesson”. He argues the history of AI suggests that attempts to build human understanding into computers have to date not worked as what progress there has been, is purely through the ability to assert ever more brute computational force to problems.
The “bitter lesson” is that “the actual contents of [human] minds are tremendously, irredeemably complex…They are not what should be built in [to machines]”. There will inevitably continue to be speculation about the possibilities of a more generalised intelligence form of AI, but for all intents and purposes we are not close currently and may well never be.
We should therefore turn to the corollary of this: what happens when we try to understand a robot as if it were another person? It is well understood that we use anthropomorphism, the human tendency to attribute human characteristics, either physical or psychological, to non-human agents. Many studies have shown that humans perceive robots as having human qualities, such as Roomba (a vacuum cleaner with a semi-autonomous system). Others have found that we tend to assign greater mental abilities to robots that have a human appearance. It seems, at first glance at least, that ToM is active for humans not only in their relationships with other humans but also through our interactions with robots.
But surely, it’s something of a surprise that we might apply ToM principles to technology: we do not do this for a range of other tools in quite the same way. But there are a number of reasons in which we might be forgiven for attempting this.
First, we could argue that the world of ‘objects’ (as computers clearly are) is not as entirely separate from us as we might think. The boundary between object and subject is in fact much more porous than we sometimes consider. We use tools to extend our capabilities: we can think of a shopping list we scribble out as a means of extending our memory, just as a hammer is a tool, we use to extend our physical capabilities. However, seeing tools as an extension of ourselves is not the same as attributing human-like qualities to them as we seem to do with digital technology.
But given the way in which the tech industry tends to imbue machines with human-like qualities, perhaps it is of little surprise that we do this ourselves. There is certainly a huge interest for technology companies to reduce the gulf between humans and machines, often achieved through appearance. SoftBank Robotics’ Pepper, for example, is a “pleasant and likeable” robot built in a humanoid form and designed to serve as a human companion. In addition to the appearance of robotics, is the human voice of digital assistants or indeed navigation aid on mobile phone.
The trouble is that many of these tasks are not actually powered by AI but humans operating behind the scenes. There can be a ‘fake it until you make it’ approach, with start-ups telling investors and users they have developed a scalable AI technology while quietly relying on human intelligence.
Technology writer Astra Taylor calls this ‘Fauxmation’:
“AI pushes the human labour out of sight..: if we are in a restaurant, we are very aware of the activity ‘behind the scenes’, the clatter of the kitchen, the business of the waiting staff. Order food for home delivery via an online app and none of this is visible”.
Another example that has had a lot of coverage is the role of human moderators of content on social media, clearing it of upsetting and damaging content in a way that technology has yet to deliver the means to do automatically.
It is therefore perhaps of little surprise that we assume technology has human like qualities; this is very much what we are encouraged to think even if the reality is less than appearance may suggest. But if we are imbuing technology with human characteristics, then what are the results of this in terms of what we can learn about ourselves? In what ways is our relationship with technology offering us a mirror to better understand ourselves? To explore this, we can turn to one of the topics that animates many, that of self-driving vehicles.
Thinking with others: driving
There has been huge excitement at the prospect of autonomous vehicles (AVs) with a great deal of investment pouring into the industry. This is of interest, as the very human nature of driving gives us a close-up view of how humans engage with technology in an especially dynamic manner. As most of us know from our own driving experience, an AV needs to make predictions about human behaviour in real time. For example, this is needed for us to pre-emptively change speed and direction to deal with another driver’s decision to abruptly pull in front of our car on a motorway.
The Bitter Lesson seems to be in action here. Work using deep neural networks has shown how an autonomous vehicle can efficiently identify human actions in streaming videos as motion patterns, but they cannot take into account that humans can rapidly change their minds based upon their own thoughts, motivations, and things they see around them. No predictive system functioning purely on past observed motion of the errant car could be accurate enough in such a complex environment as a high-speed busy motorway, without being able to take into account the context and nature of the other agents involved.
Human beings, on the other hand, can forecast others' future likely behaviour just by quickly assessing the ‘type’ of person involved (perhaps from subtle cues of the car driving behaviour as well as make and model of the car) and the scene around them (such as the forthcoming junction giving us an indication that the car maybe likely to change lane.)
There are many other examples where we can see the way that driving involves a wide array of needs to understand each other’s minds. As Malle would point out, driving necessarily involves interpreting the intentions, beliefs, emotions and desires of other drivers. We are constantly making rapid evaluation of other drivers based on their speed, position in the road, proximity to our vehicle, speed they are travelling, make and model of the car and so on. If someone is driving at speed and erratically in a beaten-up old high-performance car, then we can likely draw the conclusion that it is good to steer clear versus someone driving in a predictable manner within the speed limit in a family saloon. This is not to say these conclusions will always be right, but they may well also not be unreasonable.
It is no surprise therefore that accidents involving AVs do happen. For example, a Google car was driving in autonomous mode in the right-hand lane when it met sandbags blocking the street. To navigate the sandbags, the car tried to move into the central lane. The self-driving car, along with the driver, assumed the bus would let them in; but the bus driver thought the car would wait.” Unfortunately, all these assumptions led us to the same spot in the lane at the same time. This type of misunderstanding happens between human drivers on the road every day,” Google explained.
What this report illustrates is the way that when we are driving, we are constantly working to correctly interpret each other’s intentions. Subtle social interaction takes place such as eye contact with drivers being able to quickly confirm each other’s actions and intentions with a mere glance. The lack of these social cues from AVs creates difficulties; we assume the usual rules apply but then find to our cost that they do not. Why is that?
Shared intentionality
When we are driving on a busy road, we are experiencing the same event with other drivers. We are all travelling with the shared intention (overall) of getting to our final destinations safely and in a timely fashion. Psychologists Philip Sloman and Steven Fernbach point out that when humans interact with others, we don’t simply experience the same event, but we all know we are experiencing the same event. This is a subtle but important point that changes much. The awareness of sharing our experience not only changes the nature of the experience but also what we do and what we are able to accomplish. When we are driving on a busy road this is exactly what is happening: we are interacting and to a large degree working together to make sure we do not crash. We work together to slow down at appropriate moments, know when to speed up and so on. Of course, this does not always work – crashes still happen – but these principles are at work.
Through this sharing of attention, we can share common ground as we know things that others also know. We all can sense what other people are doing and what they intend, so as Sloman and Fernbach write: “Once knowledge is shared in this way, we share intentionality – we can jointly pursue a common goal”.
This chimes with philosopher Mary Midgely’s account of human behaviour:
“When we want to understand a real person's action, we always start by looking for the motivational context. We try to spot the particular reason for the act and then to place it on our general map of motivation, a map that we must all use as we try to find our way through everyday life. We ask, was that clumsy remark just a misplaced effort to be helpful? Did it express resentment? Was it even part of a spiteful scheme to make trouble? Or perhaps a bit of all three? …. It is the only way we can start to make sense of the life that goes on around us. Of course, it's fallible, but on the whole it works and its success is one of the things that science needs to investigate”.
We are socially embedded creatures that live in a world that only makes any sense if we locate ‘social facts’, the sets of information that we share about each other. Sloman and Fernbach point out that this means humans are capable of complex behaviour, as when multiple cognitive systems work together a shared intelligence can then emerge which goes well beyond what any one individual is capable of:
“Humans are the most complex and powerful species ever, not just because of what happens in individual brains, but because of how communities of brains work together”.
ToM is an important means we have to navigate this network of shared understandings to make sense of the world. It is not only something that allows us to connect with other humans more easily, oiling the wheel of everyday relationships, but is an essential part of how the world works.
Computers are simply not part of this – there is a mass of meaning that we carry around with us that is not only subtle and nuanced but is also constantly evolving and changing in unpredictable and non-linear ways. The ‘Bitter Lesson’ tells us that a computer cannot keep up with this: the shared meaning, so necessary to the activity of driving (and indeed many human activities), is not possible when one of the parties is a machine.
The premium of ToM
This helps us to understand that there is a premium of ease that we receive because of the way ToM operates – we do not have to explain ourselves or seek explanations from others. Of course, we are taking a risk as we may be getting things wrong in the way we infer each other’s intentions, beliefs, emotions, and desires. It is not a fool-proof exercise as we can all find out to our costs. But the more we can be confident in our ability to read each other’s intentions then the more we can trust each other.
A well-placed act of trust pays significant dividends: for businesses and governments, a level of trust with consumers means that contractual arrangements (to check the validity of the other’s claims) can be reduced, thereby avoiding higher transaction costs. Economists Stephen Knack and Phillip Keefer even found a direct relationship between increases in trust (as measured in survey responses) and increases in national economic growth. Therefore, it is in everyone’s interests that trust flourishes – without it, society as we know it couldn’t exist.
So how does this link to technology? On the one hand, there is a desire to encourage trust by making technology more humanlike, so we are tempted to apply our human-to-human style ToM mechanisms in this context. While it might be a good enough representation for some things, as we saw with autonomous vehicles it is a problem when there is an appearance of ToM not backed up by reality.
I remember vividly conducting a research project on human experiences of AI and watching as a young couple tried to use a Digital Assistant to set an alarm for them to cook some pasta. They kept attempting to set the alarm and adjusting themselves in tone of voice, choice of words and in the end gave up. They had assumed an easy human relationship and were surprised by the rigid way they had to deliver instructions – having to adapt themselves to the machine rather than, as might be the case in a human interaction, being able to find a middle ground.
If we stop trusting someone then we ask them to account for their actions. This links to the discussion around ‘explainable AI’ which acts as the alternative to Theory of Mind. Elizabeth Holm captured this well when she cited writer Douglas Adams who imagined Deep Thought, a computer programmed to answer the Great Question of Life, the Universe, and Everything. After 7.5 million years of processing, Deep Thought gave its answer: Forty-two.
She suggested this encapsulates the issue we are facing with technology – she asks what good is knowing the answer when it is unclear why it is the answer? What good is a black box?” As she puts it:
“Both an engineer and an AI system may learn to predict whether a bridge will collapse. But only the engineer can explain that decision in terms of physical models that can be communicated to and evaluated by others. Whose bridge would you rather cross?”
This is all very well but we don’t want to have to know the provenance of the bridge designer, we simply want to live in a world where we don’t have to think about it. If we are seeking explanations from the technology that is being used then this is an endless task of seeking understanding and checking, an impossible job for any individual.
The UK exam fiasco
I am not targeted with advertising for payday lenders or stopped by the police who are patrolling neighbourhoods that an algorithm directed them to. There is little immediate downside of the way technology is applying ToM principles to me (as far as is visible in my day-to-day life).
But it is increasingly understood that this is not the case for many segments of the population, as set out by Cathy O'Neil in her book, Weapons of Math Destruction. She chronicles the way that algorithms (the Weapons of Math Destruction as she calls them) that claim to quantify important traits such as teaching quality, reoffending risk, creditworthiness can often have problematic outcomes, reinforcing inequality, encoding racism, enabling predatory advertising to target vulnerable people, and even causing a global financial crisis (in the case she uses, sub-prime mortgage).
To some extent, their use to determine outcomes in marginalised segments of the population has meant that the impacts have not always been fully assimilated by others. To examine this, let’s look at the way in which the results of the 2020 A-Levels (UK tertiary education exams) for students who never sat their exams due to the pandemic were allocated based on technology in the form of an algorithm. Almost 40% of students received grades lower than they had anticipated, resulting in widespread protests by students across the UK who adopted the chant of “Fuck the algorithm”.
It is fair to say that these students felt their trust in the algorithms was misplaced: within days, their protests led officials to reverse course and throw out test scores that the algorithm had generated. Many students were stunned by the way in which their individual results of this analysis were so out of line with what they had not unreasonably expected to achieve. It seems there was no ‘sense checking’ of the findings – schools were not able to review the results to see if they were appropriate for each student.
Part of the problem here is that machines will always operate in accordance with a set of rules and principles: the algorithm for the exams was operating at the population level not at the level of the individual. This meant that while the overall exam results were in line with the national results that might be expected, the results for any individual were not necessarily so.
When a student receives a piece of paper with their exam grades on them, it is not accompanied by a detailed explanation of the process that were in place that facilitated the grading of their work. There is necessarily a trust in the overall system of how these are arrived at that does not require us to become an expert on education policy and practices.
The endowing of human like properties to an algorithm is not necessarily something that seems as irrational as it may first appear. We are not necessarily imbuing the algorithm with human capabilities; we are using ToM as a shorthand for the context of human values and considerations that we expect that the technology should be operating. And it is only when they fail to do so that we then seek to dissect the way in which these are operating and demand explanation.
Our use of ToM can be seen as a type of shorthand operation, allowing us to suspend the need to analyse details about how or why we know something. If we had to operate in a manner where everything is explained, then it is more effortful but also simply not needed much of the time.
Context is everything
As science historian Lorraine Daston put it:
“…no universal ever fits the particulars. Never in the history of human rulemaking have we created a rule or a law that did not stub its toe against unanticipated particulars”.
She makes the point that the human labour involved in avoiding toe-stubbing is often erased from history despite the way that “effective functioning has been dependent on high-level judgment, often performed by those on the low end of the hierarchical division of labor”.
In a way the work that is being done is making the equipment fit for purpose, manually buttressing them with the codes and cues needed to work with other equipment (e.g., road layouts, exam setting) but also all the assumed knowledge and shared intentionality that we have.
Matthew Crawford makes this point nicely when he writes:
“… chopsticks are part of a practice of dining that includes, for example, the use of bowls rather than plates, and the preparation of sticky rice rather than, say, loose peas…Chopsticks belong to a different equipmental whole than forks and knives….Rather, in using things like chopsticks, or fork and knife, we involve ourselves in norms: it is just understood that one does things a certain way. These norms are for the most part inarticulate; they are tacit in social practices and in the equipment we use. This is one way in which other people condition the way the world presents itself to us, even when we do not interact with them.”
At one level, anthropomorphising technology looks as if we are framing our relationship with technology as a matter between the individual and a specific device or app and in doing so not considering the wider social, moral, and infrastructural relations. But in fact, what we are doing is using this as a shorthand for what we expect these wider relationships to be, what we think they should look like, what roles and norms they should adhere to.
Unpicking this allows us to understand the way that our thoughts and way of living do not exist on their own, in some sort of abstract space, waiting for individuals to see them by the light of pure reason. Instead, we can see the way that we make sense of the world and each other is through the context of our shared meanings.
As sociologist Zeynep Tufekci notes, suppose we are standing on top of a high building and I say to someone: "They jumped”. The literal meaning of this seems obvious: That someone has jumped. However, the context adds meaning so that it makes sense to construe the claim "they jumped" as, "Someone has jumped off the building”. If then it becomes apparent that I am reporting my friend has simply jumped up and down once, there would be grounds to complain that I was being misleading, that someone had jumped off the building.
In just the same way, the cues used by technology sets a context of shared meanings: if we are driving, we expect the shared intentionality of navigating our busy road alongside other drivers with all the norms, customs, and practices that this brings. Similarly, if we get exam results, we expect there to have been a backdrop of our individual performance to have been considered above other influences.
The way in which technology is described, the way it is designed, the applications it is put to are never neutral: instead, we can see the design features as ‘social affordances’, setting expectations for the way the technology in question works with our shared meanings, intentionality’s, and our wider lives generally. But at the same time, what we have seen in this exploration is the way in which this wider setting is not something that is always crystallised and understood by people, that we use the shorthand of anthropomorphism to express the expectations that we have.
On being human
ToM means people are transformed from ‘objects’ into ‘subjects’ who can act intentionally and who have desires, beliefs, attitudes and so on that direct their actions. We need to understand other minds in order to engage in the complexity of the interactions that life in communities requires of us.
We explain intentional behaviours, by using a sophisticated framework of interpretation. Intentional behaviour must involve a desire for an outcome, beliefs about how a particular deed will result in that outcome, and the intention to carry out that deed; if the person then performs the deed, we understand it as an intentional action.
As such the proliferation of technology has shown us more clearly the way that as humans we operate in a teleological manner, in that we behave purposefully. That sets us on a very different trajectory than many accounts of human behaviour that suggest we are more ‘reactive’ to the world. Inherent in this explanation must be the importance of our inner selves, operating proactively in the world.
But even more than this, we can also see the way in which humans are extending the language and meaning of our active inner worlds to use as a means of explanations of the external world. Using this language as shorthand to explain the apparently non-human makes sense as, of course, the algorithms of the digital world in which we increasingly live are shaped by the human.
This leads us down a route where we are surely starting to see the way in which this shared language and meanings of intentionality references something deep and important. We are starting to get a sense that human consciousness is perhaps something that does not just reside in our heads. Instead, it is starting to dawn on us that this assumption that has been dominant in Western thinking for centuries, needs to be amended as it is becoming evident that our purposefulness, our intentionality, our self-awareness is something that very much sits between us and does not simply reside in our heads.