On the issue of calibration in DNN-based speaker recognition systems
This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment tra...
Autores principales: | , , , , , , , , |
---|---|
Formato: | CONF |
Materias: | |
Acceso en línea: | http://hdl.handle.net/20.500.12110/paper_2308457X_v08-12-September-2016_n_p1825_McLaren |
Aporte de: |
id |
todo:paper_2308457X_v08-12-September-2016_n_p1825_McLaren |
---|---|
record_format |
dspace |
spelling |
todo:paper_2308457X_v08-12-September-2016_n_p1825_McLaren2023-10-03T16:40:51Z On the issue of calibration in DNN-based speaker recognition systems McLaren, M. Castan, D. Ferrer, L. Lawson, A. Morgan N. Georgiou P. Morgan N. Narayanan S. Metze F. Amazon Alexa; Apple; eBay; et al.; Google; Microsoft Bottleneck features Calibration Deep neural network Mismatch Speaker recognition Alignment Calibration Speech communication Speech processing Bottleneck features Computationally efficient Deep neural networks Discriminative power Mismatch Speaker recognition Speaker recognition system Universal background model Speech recognition This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment traditional features with those extracted from a bottleneck layer of the DNN. These techniques provide extremely good performance for constrained trial conditions that are well matched to development conditions. However, when applied to unseen conditions or a wide variety of conditions, some DNN-based techniques offer poor calibration performance. Through analysis on both PRISM and the recently released Speakers in the Wild (SITW) corpora, we illustrate that bottleneck features hinder calibration if used in the calculation of first-order Baum Welch statistics during i-vector extraction. We propose a hybrid alignment framework, which stems from our previous work in DNN senone alignment, that uses the bottleneck features only for the alignment of features during statistics calculation. This framework not only addresses the issue of calibration, but provides a more computationally efficient system based on bottleneck features with improved discriminative power. Copyright © 2016 ISCA. CONF info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_2308457X_v08-12-September-2016_n_p1825_McLaren |
institution |
Universidad de Buenos Aires |
institution_str |
I-28 |
repository_str |
R-134 |
collection |
Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) |
topic |
Bottleneck features Calibration Deep neural network Mismatch Speaker recognition Alignment Calibration Speech communication Speech processing Bottleneck features Computationally efficient Deep neural networks Discriminative power Mismatch Speaker recognition Speaker recognition system Universal background model Speech recognition |
spellingShingle |
Bottleneck features Calibration Deep neural network Mismatch Speaker recognition Alignment Calibration Speech communication Speech processing Bottleneck features Computationally efficient Deep neural networks Discriminative power Mismatch Speaker recognition Speaker recognition system Universal background model Speech recognition McLaren, M. Castan, D. Ferrer, L. Lawson, A. Morgan N. Georgiou P. Morgan N. Narayanan S. Metze F. Amazon Alexa; Apple; eBay; et al.; Google; Microsoft On the issue of calibration in DNN-based speaker recognition systems |
topic_facet |
Bottleneck features Calibration Deep neural network Mismatch Speaker recognition Alignment Calibration Speech communication Speech processing Bottleneck features Computationally efficient Deep neural networks Discriminative power Mismatch Speaker recognition Speaker recognition system Universal background model Speech recognition |
description |
This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment traditional features with those extracted from a bottleneck layer of the DNN. These techniques provide extremely good performance for constrained trial conditions that are well matched to development conditions. However, when applied to unseen conditions or a wide variety of conditions, some DNN-based techniques offer poor calibration performance. Through analysis on both PRISM and the recently released Speakers in the Wild (SITW) corpora, we illustrate that bottleneck features hinder calibration if used in the calculation of first-order Baum Welch statistics during i-vector extraction. We propose a hybrid alignment framework, which stems from our previous work in DNN senone alignment, that uses the bottleneck features only for the alignment of features during statistics calculation. This framework not only addresses the issue of calibration, but provides a more computationally efficient system based on bottleneck features with improved discriminative power. Copyright © 2016 ISCA. |
format |
CONF |
author |
McLaren, M. Castan, D. Ferrer, L. Lawson, A. Morgan N. Georgiou P. Morgan N. Narayanan S. Metze F. Amazon Alexa; Apple; eBay; et al.; Google; Microsoft |
author_facet |
McLaren, M. Castan, D. Ferrer, L. Lawson, A. Morgan N. Georgiou P. Morgan N. Narayanan S. Metze F. Amazon Alexa; Apple; eBay; et al.; Google; Microsoft |
author_sort |
McLaren, M. |
title |
On the issue of calibration in DNN-based speaker recognition systems |
title_short |
On the issue of calibration in DNN-based speaker recognition systems |
title_full |
On the issue of calibration in DNN-based speaker recognition systems |
title_fullStr |
On the issue of calibration in DNN-based speaker recognition systems |
title_full_unstemmed |
On the issue of calibration in DNN-based speaker recognition systems |
title_sort |
on the issue of calibration in dnn-based speaker recognition systems |
url |
http://hdl.handle.net/20.500.12110/paper_2308457X_v08-12-September-2016_n_p1825_McLaren |
work_keys_str_mv |
AT mclarenm ontheissueofcalibrationindnnbasedspeakerrecognitionsystems AT castand ontheissueofcalibrationindnnbasedspeakerrecognitionsystems AT ferrerl ontheissueofcalibrationindnnbasedspeakerrecognitionsystems AT lawsona ontheissueofcalibrationindnnbasedspeakerrecognitionsystems AT morgann ontheissueofcalibrationindnnbasedspeakerrecognitionsystems AT georgioup ontheissueofcalibrationindnnbasedspeakerrecognitionsystems AT morgann ontheissueofcalibrationindnnbasedspeakerrecognitionsystems AT narayanans ontheissueofcalibrationindnnbasedspeakerrecognitionsystems AT metzef ontheissueofcalibrationindnnbasedspeakerrecognitionsystems AT amazonalexaappleebayetalgooglemicrosoft ontheissueofcalibrationindnnbasedspeakerrecognitionsystems |
_version_ |
1807315544511086592 |