Researchers have great showing in emotion recognition challenge


Katie Carr, Coordinated Science Laboratory

ADSC researchers have been celebrating successes in emotion recognition for several years, from ranking among the top three in facial expression recognition competitions at ACM ICMI 2015 and 2016 to incorporating a spin-off, Opsis, that has been commercializing the technology in the domains of human resources, marketing, customer service and others.

Recently, ADSC researchers Songyou Peng, Zhang Le, and Stefan Winkler placed second overall in an emotion recognition challenge held in conjunction with the 2018 IEEE World Congress on Computation Intelligence and the International Joint Conference on Neural Networks.

The One-Minute Gradual-Emotion Behavior Challenge (OMG-Emotion) had teams evaluate one-minute video clips and identify the speakers’ emotions in the videos. The participants were provided with video, audio and text and results were evaluated on their arousal and valance values. The ADSC researchers placed second overall in valence prediction accuracy and first in vision-only arousal prediction.

The researchers’ techniques were unique from other work being done in that they used a pre-trained deep network that’s designed for face verification on large-scale datasets. Additionally, they employed a randomized sparse sampling strategy to look at the frames of the videos to avoid over-fitting and they used Long Short Term Memory networks (LSTM) to aggregate temporal information from each video.

This methodology which was submitted for the OMG-Emotion challenge was developed with support from Digital Emotions, a project by A*STAR’s Science and Engineering Research Council (SERC).

“Emotions can be characterized by the two dimensions of valence and arousal,” Winkler said. “Valence is positive versus negative emotions and arousal measures passive versus active emotions. We measure these on a scale of -1 to 1, which allows us to distinguish small changes in expressions and recognize subtle emotions.”

The dataset provided was a collection of annotated YouTube videos around one minute in length. Each video was selected because of its emotional behaviors, such as facial expressions and language context.

“Previous work in emotion recognition has mostly focused on predicting a limited number of emotion categories, rather than more complex emotion labels, such as arousal and valence, which are much more fine-grained,” Zhang said.

“Our method for the challenge was an application of our already developed research methods,” Songyou said. “We feel this research is exciting from a machine intelligence perspective. Once you have audio or video, you can teach a machine to make automatic predictions on emotion attributes and there are lots of applications for that.”

About Project Digital Emotions

Digital Emotions is a project which aims to develop technological and scientific capabilities in performing multi-modal, multi-lingual emotion analysis of naturally occurring human expressions. Accounting for Asia’s sociocultural context, the project will build a next-generation integrative system that is capable of recognising emotions from visual and non-visual cues, such as in audio and language, for a broad range of potential applications such as in media, consumer, healthcare, finance, transport, hospitality, and professional services. The research team comprises scientists and engineers from A*STAR’s Institute of High Performance Computing (IHPC), Institute for Infocomm Research (I2R), and Illinois at Singapore Pte Ltd’s Advanced Digital Sciences Center (ADSC). Digital Emotions is a project spearheaded by A*STAR’s Science and Engineering Research Council.