Combining Acoustic and Articulatory Information for Robust Speech Recognition

Katrin Kirchhoff, Gernot A. Fink and Gerhard Sagerer
Speech Communication, 37(3-4), pages 303-319, 2002.

BibTeX

Abstract

The idea of using articulatory representations for automatic speech recognition continues to attract much attention in the speech community. Representations which are grouped under the label "articulatory" include articulatory parameters derived by means of acoustic-articulatory transformations (inverse filtering), direct physical measurements, or classification scores for pseudo-articulatory features. In this study we revisit the use of features belonging to the third category. In particular, we concentrate on the potential benefits of pseudo-articulatory features in adverse acoustic environments and on their combination with standard acoustic features. Systems based on articulatory features only and combined acoustic-articulatory systems are tested on two different recognition tasks: telephone-speech continuous numbers recognition and conversational speech recognition. We show that articulatory feature systems are capable of achieving a superior performance at high noise levels and that the combination of acoustic and articulatory features consistently leads to a significant reduction of word error rate across all acoustic conditions.