Speaker Diarization: Optimal Clustering and Learning Speaker Embeddings

Channel:

Subscribers:

344,000

Published on June 13, 2016 7:49:28 PM ● Video Link: https://www.youtube.com/watch?v=vcyB8xb1-ys

Duration: 1:06:53

4,240 views

Speaker diarization consist of automatically partitioning an input audio stream into homogeneous segments (segmentation) and assigning these segments to the same speaker (speaker clustering). This process can allow to enhance the readability by structuring an audio document, or provide the speaker's true identity when it's used in conjunction with speaker recognition system. In this seminar I will talk about two new methods: ILP Clustering and Speaker embeddings. In speaker clustering, a major problem with using greedy agglomerative hierarchical clustering (HAC) is that it does not guarantee an optimal solution. I propose a new clustering model (called ILP Clustering), by redefining clustering problem as a linear program (ie. linear program is defined by an objective function and subject to linear equality and/or linear inequality constraint). Thus an Integer Linear Programming (ILP) solver can be used to search the optimal solution over the whole problem. In a second part, I propose to learn a set of high-level feature representations through deep learning, referred to as speaker embeddings. Speaker embedding features are taken from the hidden layer neuron activations of Deep Neural Networks (DNN), when learned as classifiers to recognize a thousand speaker identities in a training set. Although learned through identification, the speaker embeddings are shown to be effective for speaker verification in particular to recognize speakers' unseen in the training set. The experiments were conducted on the corpus of French broadcast news ETAPE where these new methods based on ILP/speaker-embeddings decreases DER by 4.79 points over the baseline diarization system based on HAC/GMM.

Other Videos By Microsoft Research

2016-06-13	What are the prospects for automatic theorem proving?
2016-06-13	Towards Understandable Neural Networks for High Level AI Tasks - Part 3
2016-06-13	Artist in Residence (formerly Studio99) Presents: Michael Gough and "Drawing as Literacy."
2016-06-13	Towards Cross-fertilization Between Propositional Satisfiability and Data Mining
2016-06-13	Making Objects Count: A Shape Analysis Framework for Proving Polynomial Time Termination
2016-06-13	Human factors of software updates
2016-06-13	Machine-Checked Correctness and Complexity of a Union-Find Implementation
2016-06-13	Applications of 3-Dimensional Spherical Transforms to Acoustics and Personalization of Head-related
2016-06-13	Network Protocols: Myths, Missteps, and Mysteries
2016-06-13	Optimal and Adaptive Online Learning
2016-06-13	Speaker Diarization: Optimal Clustering and Learning Speaker Embeddings
2016-06-13	Multi-rate neural networks for efficient acoustic modeling
2016-06-13	Unsupervised Latent Faults Detection in Data Centers
2016-06-13	System and Toolchain Support for Reliable Intermittent Computing
2016-06-13	Gates Foundation Presents: Crucial Areas of Fintech Innovation for the Bottom of the Pyramid
2016-06-13	Social Computing Symposium 2016: Harassment, Threats, Trolling Online, Diversity in Gaming is Vital
2016-06-13	Bringing Harmony Through AI and Economics
2016-06-13	Approximating Integer Programming Problems by Partial Resampling
2016-06-13	A Lasserre-Based (1+epsilon)-Approximation for Makespan Scheduling with Precedence Constraints
2016-06-13	Towards Understandable Neural Networks for High Level AI Tasks - Part 7
2016-06-13	Verasco, a formally verified C static analyzer

Tags:

microsoft research

speaker diarization

speaker clustering

ilp clustering

deep neural networks

natural language processing and speech

Channel	Latest
Riz Goodies TV	6 hours ago
wandis channel	6 hours ago
Jean 360°	7 hours ago
Sodapoppin Playthroughs	7 hours ago
Vithy	7 hours ago
CXI_NARAWI	7 hours ago
Iqbal Jabiren	7 hours ago
Aditya Aslami	7 hours ago
MISS MIARI	7 hours ago
REKSA	8 hours ago
NDD TV	8 hours ago
VUONG	8 hours ago
SahiDDROid.	8 hours ago
Borutokun Indonesia	8 hours ago
JOVCARS	8 hours ago
SfishYt	8 hours ago
OMMY TV	8 hours ago
TASVideosChannel	8 hours ago
ARASTi	8 hours ago
ZipTheWorld	8 hours ago
Hakumi Ishiki Ch.	8 hours ago
Aa AHEN	8 hours ago
Jhing Albino	8 hours ago
Mualaf Channel YT	8 hours ago
Ken RAW0880	8 hours ago