An Efficient Method for Making Un-Supervised Adaptation of {HMM}-based Speech Recognition Systems Robust Against Out-Of-Domain Data

T. Pl{\"o}tz and G. A. Fink
Proc. 4th Int. Workshop on Natural Language Processing and Cognitive Science, 2007.

Funchal, Portugal

BibTeX PDF

Abstract

Major aspects of cognitive science are based on natural language processing utilizing automatic speech recognition (ASR) systems in scenarios of human-computer interaction. In order to improve the accuracy of related HMM-based ASR systems efficient approaches for un-supervised adaptation represent the methodology of choice. The recognition accuracy of speaker-specific recognition systems derived by online acoustic adaptation directly depends on the quality of the adaptation data actually used. It drops significantly if sample data out-of-scope (lexicon, acoustic conditions) of the original recognizer generating the necessary annotation is exploited without further analysis. In this paper we present an approach for fast and robust MLLR adaptation based on a rejection model which rapidly evaluates an alternative to existing confidence measures, so-called log-odd scores. These measures are computed as ratio of scores obtained from acoustic model evaluation to those produced by some reasonable background model. By means of log-odd scores threshold based detection and rejection of improper adaptation samples, i.e. out-of-domain data, is realized. By means of experimental evaluations on two challenging tasks we demonstrate the effectiveness of the proposed approach.