Scattering Invariants for Audio Classification

Subscribers:
342,000
Published on ● Video Link: https://www.youtube.com/watch?v=W_Wbnp_uw-o



Duration: 1:02:27
2,116 views
36


To obtain efficient feature representations for audio classification, it is desirable to have invariance to time-shift and stability to time-warping. Mel-frequency cepstral coefficients (MFCCs) satisfy these criteria, but are unsuitable for modeling large-scale temporal structure. The scattering transform extends this representation through a convolutional network of wavelet transforms and modulus operators, capturing structures at larger time scales. Additional invariance to frequency transposition with stability to frequency-warping is obtained by applying a second scattering transform along the log-frequency axis. Using these representations, we obtain state-of-the-art results on tasks such as phone segment classification and musical genre classification on the TIMIT and GTZAN datasets, respectively.







Tags:
microsoft research