Realtime Detection of Social Cues

Date: March 2017

HW Park, M Gelsomini, JJ Lee, and C Breazeal (2017). Telling Stories to Robots: The Effect of Backchanneling On A Child’s Storytelling. In Proceedings of the International Conference on Human-Robot Interaction (HRI).

HW Park, M Gelsomini, JJ Lee, and C Breazeal (2017). Backchannel Opportunity Prediction for Social Robot Listeners. In Proceedings of the International Conference on Robotics and Automation (ICRA).

In everyday conversation, people use what are known as backchannels to signal to someone that they are still listening, paying attention, and engaged. As listeners, we smile, nod, and say “uh-huh” to convey attentiveness, and we do this naturally with little thought. We give this feedback not randomly but at certain moments in the conversation because speakers give off social cues that signal upcoming backchanneling opportunities.

Speaker Cue Detection Model
Four rule-based models to detect prosodic-based speaker cues.

A robot listener will need to detect for these social cues to carefully time its responses. We developed a realtime rule-based model that detects for these cues based on the prosody of the speaker’s voice. From low-level speech features, the model detects for significant changes in pitch, energy shifts, long pauses, and long utterances. Its model parameters were trained and tested on a dataset of children’s voices. We then used this model to trigger contingent behaviors of a listening robot, and children were highly engaged with the robots as they told them stories about their day.