“Audiovisual Automatic Speech Recognition and Related Bimodal Speech Technologies:
A Review of the State-of-the-Art and Open Problems”
Dr. Gerasimos Potamianos, National Center for Scientific Research
Sponsored by the Dallas Chapter of the IEEE Signal Processing Society
The presentation will provide an overview of the main research achievements and the state-of-the-art in audiovisual speech processing, mainly focusing on audiovisual automatic speech recognition. The topic has been of interest in the speech research community due to the potential of increased robustness to acoustic noise that the visual modality holds. Nevertheless significant challenges have hindered practical applications of the technology – most notably difficulties with visual speech information extraction and audiovisual fusion algorithms that remain robust to the audiovisual environment variability inherent in practical, unconstrained interaction scenarios and audiovisual data sources, for example multi-party interaction in smart spaces, broadcast news, etc. These challenges are also shared across a number of interesting audiovisual speech technologies beyond the core speech recognition problem, where the visual modality has the potential to resolve ambiguity inherent in the audio signal alone; for example, speech enhancement, speech activity detection, speaker recognition and others.
Gerasimos Potamianos received his PhD from Johns Hopkins University in 1994. From 1994 to 1996 he was a postdoctoral fellow with the Center for Language and Speech Processing. From 1996 to 1999 he was a senior member of Technical Staff with the then Speech and Image Processing Services Laboratory at AT&T Labs. In 1999 he joined the Human Language Technologies Department (currently Multilingual Analytics and User Technologies) at the IBM Thomas J. Watson Research Center, where he eventually became manager of the Multimodal Conversational Solutions Department. In 2008 he joined the Telecommunications and Networks Laboratory of the Institute of Computer Science at FORTH as an associate researcher, where, within the institute’s Ambient Intelligence Program, he continued his research activities in multimodal and multisensory processing of speech with an emphasis on ambient intelligence environments. In 2009 he was named a research director at the Software and Knowledge Engineering Laboratory of the Institute of Informatics and Telecommunications at the National Center for Scientific Research.