Intentional Inference of Emotions

Date: May 2017

Jin Joo Lee. A Bayesian Theory of Mind Approach to Nonverbal Communication for Human-Robot Interactions. Doctoral Thesis, Massachusetts Institute of Technology, 2017.

How can you tell if your friend is happy, sad, frustrated, or engaged in a conversation? Their facial expressions, body language, or the tone of their voice are all nonverbal behaviors that are clues. But reading people’s emotions isn’t textbook, and people don’t normally wear their emotions on their sleeve. So to gain a better sense, we also ask our friends subtle but probing questions like “how is your day going so far” or change our behavior, like being goofy, to see if you get a reaction. It takes a few tries to reach their true underlying feelings. Emotion understanding is a process!

As shown in the video above, today’s emotion detection or recognition technologies try to classify a person’s emotions based on just surface observations. But social agents, like Apple’s Siri, Amazon’s Alexa or personal robots, operate on a social medium of dialog and have the opportunity to engage users in this emotion understanding process.

Emotion Recognition Models
State estimators like hidden Markov models (HMM) are common machine learning algorithms used for emotion recognition. But my work redefines emotion recognition as an interactive and intentional process that occurs in social interactions. Like in storytelling activities, storytellers actively try to understand the emotional state of their partner using social cues, which are nonverbal “actions” they can take. I capture their emotion inference process, or in other words their strategy/policy, as a POMDP model.

In my work, I computationally model emotion understanding as a goal-oriented, or intentional, inference process directed by a social agent. The social agent has a goal of wanting to form a correct inference about their partner’s emotions and tries out different behaviors to close gaps in their certainty.

In a storytelling application, I modeled how storytellers are able to guess whether a listener is attentive or inattentive to their story. I model their strategy in how they go about making this decision as a partially observable Markov decision process (a POMDP model). In representing emotion recognition as an active process from the perspective of storytellers, my POMDP model is more accurate in knowing a listener’s true attentive state.

Social agents should not be passive observers when trying to understand the emotions of their human partners. They can engage with them interactively, and through this back-n-forth gain a more accurate understanding about how a person is feeling.

ROC curve
In ROC space, the upper left corner represents perfect classification and a curve closer to this point is the better predictor. When representing emotion understanding as a goal-directed POMDP process, we achieve greater accuracy in detecting inattentive listeners compare to more traditional state-estimation models.