Recent Efforts Towards Efficient And Scalable Neural Waveform Coding
Acoustic signal compression techniques, converting the floating-point waveform into the bitstream representation, serve a cornerstone in the current data storage and telecommunication infrastructure. The rise of data-driven approaches for acoustic coding systems brings in not only potentials but also challenges, among which the model complexity is a major concern: on the one hand, this general-purpose computational paradigm features the performance superiority; on the other hand, most codecs are deployed on low power devices which barely afford the overwhelming computational overhead. In this talk, I will introduce several of our recent efforts towards a better trade-off between performance and efficiency for neural speech/audio coding. I will present on cascaded cross-module residual learning to conduct multistage quantization in deep learning techniques; in addition, a collaborative quantization scheme will be talked about to simultaneously binarize linear predictive coefficients and the corresponding residuals. If time permits, a novel perceptually salient objective function with a psychoacoustical calibration will also be discussed.
Learn more about this and other talks at Microsoft Research: https://www.microsoft.com/en-us/research/video/recent-efforts-towards-efficient-and-scalable-neural-waveform-coding/