Artificial Listener with Social Intelligence

Date: May 2017

Jin Joo Lee. A Bayesian Theory of Mind Approach to Nonverbal Communication for Human-Robot Interactions. Doctoral Thesis, Massachusetts Institute of Technology, 2017.

In storytelling interactions, storytellers employ a certain but subtle social cueing strategy when they believe a listener is paying attention to their story. But, if they think they are not paying attention, then this strategy changes. It flips to one where the storyteller uses stronger cueing signals in hopes to get the listener to pay attention again. I discovered these different strategies using machine learning models that find such patterns from real-world human interaction data (see this project).

Bayesian Theory of Mind (BToM) Approach to Nonverbal Communication
The BToM model consists of two processes: 1) the DBN belief estimator tracks what a storyteller believes about the robot’s attentive state through plan inversion 2) the myopic policy selects listening behaviors to affect those beliefs toward a goal inference.

Using a Bayesian theory of mind (BToM) approach, my AI algorithm, represented as a dynamic Bayesian network (DBN), tracks what kind of strategy a storyteller is exhibiting to get a sense what he/she currently thinks about a listening robot. Do they think the robot is being attentive or inattentive to their story? Based on this estimation, the robot then changes is body language to improve or even tone down its levels of expressivity because it has a goal of communicating an appropriate level of attentiveness. The social robot is in tune with what you think about it and modifies its behavior to alter your perception.

I integrated my AI algorithm into a realtime perception-to-behavior generation pipeline that controls a social robot. My software architecture supports the detection of prosodic- and gaze- based social cues from a child storyteller, decides how a social robot should respond based on it AI policy, and controls the exhibited set of expressive animations.

Realtime Perception-to-behavior generation Pipeline
The perception-cognition-action architecture of an artificial listener. The message passing between modules uses ROS, an open-source Robot Operating System

People do this kind of “mind reading” to understand what others think about them, and its often considered to be common sense or even street smarts. A major factor in the success of robots will heavily depend on their ability to effectively communicate with us as socially intelligent agents.