Today, neural networks can easily identify emotions in texts, photos and videos. The next step is modelling them—an essential component of full-fledged intelligence in people and machines alike.
In humans, intelligence is closely related to emotions. Disorders in emotional development, whether sociopathy or autism, stand out from the norm. Moreover, psychologists have long talked about such phenomena as emotional and social intelligence. Artificial intelligence (AI) must be able to understand our experiences so that it does not develop sociopathic or autistic traits and can act effectively and comfortably in the social world. The field related to the recognition, interpretation and modelling of human emotions is called affective computing.
This term appeared in the mid-1990s, after the publication of an article of the same title by Rosalind Picard, Professor of the Massachusetts Institute of Technology (MIT). Even then, long before the advent of modern deep learning technologies, she wrote: ‘Emotions play a critical role in rational decision-making in perception in human interaction and in human intelligence. If computers will ever interact naturally and intelligently with humans then they need the ability to at least recognize and express affect.’
Developers are well-aware of these assumptions, and today, ‘affective computing’ technologies are actively applied in practice. For example, Cogito supplies systems for call centres capable of detecting emotions in customers and simulating them in chatbots.
Affectiva, another startup, uses a similar approach to analyse the effectiveness of advertising campaigns. The innovative Toyota Concept-i car has an integrated system for monitoring the condition of the driver and passengers, including their emotional state.
Microsoft Azure, one of the leading providers of cloud services, offers a facial recognition tool that identifies emotions in photos. There is also CompanionMx, a mobile application that analyses the sound of a voice to track signs of anxiety and stress. Experiments with ‘emotional systems’ that help patients with autism spectrum disorders are also underway—machines are already teaching people how to experience emotions correctly.
However, people are usually experts at broadcasting their emotions and recognising those of others. Our experiences are reflected in the position and movements of our body, but most importantly—in the sound of speech and facial expressions. The 40+ muscles of the face are able to produce about 10,000 different expressions. However, this whole subtle spectrum can be reduced to several basic emotional experiences, for example, to six: fear, anger, disgust, sadness, surprise, joy. Some studies consider 11 or even 50 experiences.
Thus, we can classify emotions, which means we can collect extensive databases of portraits and mark them up, preparing data for training neural networks. Such systems identify a face in a picture, then its key details (eyes, eyebrows, tips of the nose and chin, corners of the lips, etc) to take into account the position and rotation of the head. Finally, information about facial expressions is highlighted and analysed.
At the same time, neural networks can rely on systems already developed by psychologists for encoding facial movements—like the FACS system developed by the famous researcher Paul Ekman (his work is frequently referred to in the TV series Lie to Me ). Such systems correlate each emotion with typical movements of facial expressions. For example, joy can look like "6 + 12"—that is, the simultaneous lifting of the cheek (digital code 6) and the corners of the lips (code 12). Having recognised such changes in the positions of the cheeks and lips, the neural network can interpret them using FACS codes.
Voice recordings are processed in a similar way. Technologies based on the interpretation of words used don’t work very well—so far, computers can’t recognise either irony or even very transparent hints. Therefore, ‘non-verbal’ characteristics are mostly used, such as the timbre of the voice, the volume and tempo of speech, the duration of pauses, voice range changes, etc. These solutions are used in some chatbots and voice assistants.
So far, the only form of emotion recognition not used is that based on posture and body movements. Such information is more difficult not only to analyse, but also to simply obtain: to begin with, the computer needs to reconstruct a three-dimensional model of the body from an image. Therefore, attempts to create such programs remain exclusively experimental and, as a rule, are reduced to the analysis of hand gestures. However, the biggest challenge is reproducing emotions in a machine.
Some chatbots are capable of simulating discontent or raising the eyebrow of an avatar in surprise. But in most cases, they do this based not on deep learning but on the algorithmic reproduction of expressions typical for a particular experience. One notable exception is a recent project in which a neural network was trained on recordings of professional actors. As a result, the network learned to speak very realistically with its voice raised.
There is no doubt that AI will soon be able to simulate human experiences quite realistically. But it won’t be limited to simple imitation. Antonio Damasio, Professor at the University of Southern California, notes that emotions are ‘theatres’—we act them out, demonstrate them, present an interface for our external true or false experiences. The most important thing relates to our internal feelings, which produce a series of states in our own bodies.
In his speech in 2021 at HSE University in Moscow, Professor Damasio noted: ‘So something quite astonishing has happened in evolution, and people hardly have a pause to think about it. And that is that when feelings first came into being in living organisms with nervous systems, they came endowed with consciousness. And because they were endowed with consciousness, they had a huge impact in what's going on in the lives of those organisms. So up to the birth of feelings, organisms were regulated autonomously. They were regulated by the autonomic nervous system.’
‘Feeling is a producer of something that is knowledge, and by knowledge I mean, of course, conscious knowledge. When you feel, you know,’ says the scientist. His reflections are especially relevant in a time when large language models like LaMDA or ChatGPT are attributed human abilities, and sometimes consciousness.
Such conceptions originate with the ideas of Marvin Minsky, one of the founders of artificial intelligence, who spoke about the ‘emotion machine’. It is not surprising that both Professor Damasio and many programmers believe that the way to strong AI (Artificial General Intelligence, AGI) lies precisely in the field of emotion modelling. It could be a machine that is sensitive to the states of its own structure, capable of distinguishing the more or less ‘favourable’ ones among them, and acting and learning while analysing such states. Such systems, which not only recognise other people's emotions, but also experience their own, may become our first full-fledged electronic companions.
Text authors: Roman Fishman, Daniil Kuznetsov