Some Recent Advances in Gaussian Mixture Modeling for Speech Recognition
State-of-the-art Hidden Markov Model (HMM) based speech recognition systems typically use Gaussian Mixture Models (GMMs) to model the acoustic features associated with each HMM state. Due to computational, storage and robust estimation considerations the covariance matrices of the Gaussians in these GMMs are typically diagonal. In this talk I will describe several new techniques to model the acoustic features associated with an HMM state better - subspace constrained GMMs (SCGMMs), non-linear volume-preserving acoustic feature space transformations etc. Even with better models, one has to deal with mismatches between the training and test conditions. This problem can be addressed by adapting either the acoustic features or the acoustic models to reduce the mismatch. In this talk I will present several approaches to adaptation - FMAPLR (a variant of FMLLR that works well with very little adaptation data), adaptation of the front-end parameters, adaptation of SCGMMs, etc. While the ideas presented are explored and evaluated in the context of speech recognition, the talk should appeal to anyone with an interest in statistical modeling.