“Measuring, Modeling and Using Speech Production Information”
Dr. Shrikanth Narayanan, University of Southern California
The human speech signal results from a complex orchestration of cognitive, biological, physical and social processes and carries crucial information about not only communication intent but also underlying affect and emotions. It co-occurs with gestures of the face, head, hands and other parts of the body. Automatically processing and decoding speech and spoken language hence is a vastly challenging and an inherently interdisciplinary endeavor. One line of work in this realm aims to use direct information about human speech and gesture articulation to inform technology development. The challenging engineering problems faced here are two-fold: obtaining accurate speech production data and finding ways for modeling and using such data. Both will be considered in this talk.
One longstanding challenge in speech production has been the ability to examine real-time changes in the shaping of the vocal tract, although this has been recently addressed using imaging techniques such as ultrasound, movement tracking and magnetic resonance imaging. The spatial and temporal resolution afforded by these techniques, however, has limited the scope of the investigations that could be carried out. In this talk we will focus on recent advances that allow us to perform near real-time investigations on the dynamics of vocal tract shaping during speech. We will use examples from recent and ongoing research at USC to highlight some of the methods and outcomes of processing such data, especially toward facilitating speech analysis and modeling. This work is supported by NIH, ONR and NSF.
Shrikanth (Shri) Narayanan is the Andrew J. Viterbi Professor of Engineering at USC, where he holds appointments as professor of electrical engineering, computer science, linguistics and psychology. He was previously with AT&T Bell Labs and AT&T Research. His research focuses on human-centered information processing and communication technologies. He is an editor of the Computer Speech and Language Journal and an associate editor of the IEEE Transactions on Multimedia, the IEEE Transactions on Affective Computing and the Journal of the Acoustical Society of America. He is a fellow of the Acoustical Society of America, IEEE and the American Association for the Advancement of Science. He received best paper awards from the IEEE Signal Processing Society in 2005 (with Alex Potamianos) and 2009 (with Chul Min Lee).