ACCENT - Pronunciation Modeling

Project date: 
1 January, 2001 - 31 December, 2004

This research aimed at modeling pronunciation variations for Automatic Speech Recognition (ASR) at the level of the lexicon (as opposite to a modeling at the level of the acoustic models). We developed a data-driven technique for upgrading a lexicon of reference pronunciations to one with multiple pronunciation variants per entry of the reference lexicon. The approach is based on the following basic principles:

  • Pronunciation variants can be obtained by applying stochastic pronunciation rules to each entry of the reference lexicon.
  • The pronunciation rules required to do so can be learned fully automatically from an orthographically transcribed corpus.

Thanks to the stochastic character of the pronunciation rules, the created pronunciation variants have probabilities attached to them. These probabilities were found to be of vital importance to keep control over the lexical confusability that can emerge from the fact that by introducing variants the 'distance' between similar words like comment and command can be decreased (variant of one word can become almost identical to a variant of the other word).

Results: 

The approach was tested on different speech databases using different types of speech recognizers (using stochastic segment, context-independent HMM and context-dependent HMM acoustic models). In case of context-independent models, the lexicon with variants can provide a significant reduction (up to 20%) of the word error rate (WER). However, if the recognizer incorporates context-dependent acoustic models, the improvement is only moderate (WER reduction of 5%).

Partners: