On the issue of calibration in DNN-based speaker recognition systems

This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment tra...

Descripción completa

Detalles Bibliográficos
Autores principales:	McLaren, M., Castan, D., Ferrer, L., Lawson, A., Morgan N., Georgiou P., Narayanan S., Metze F., Amazon Alexa; Apple; eBay; et al.; Google; Microsoft
Formato:	CONF
Materias:	Bottleneck features Calibration Deep neural network Mismatch Speaker recognition Alignment Speech communication Speech processing Computationally efficient Deep neural networks Discriminative power Speaker recognition system Universal background model Speech recognition
Acceso en línea:	http://hdl.handle.net/20.500.12110/paper_2308457X_v08-12-September-2016_n_p1825_McLaren
Aporte de:	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) de Universidad de Buenos Aires

id	todo:paper_2308457X_v08-12-September-2016_n_p1825_McLaren
record_format	dspace
spelling	todo:paper_2308457X_v08-12-September-2016_n_p1825_McLaren2023-10-03T16:40:51Z On the issue of calibration in DNN-based speaker recognition systems McLaren, M. Castan, D. Ferrer, L. Lawson, A. Morgan N. Georgiou P. Morgan N. Narayanan S. Metze F. Amazon Alexa; Apple; eBay; et al.; Google; Microsoft Bottleneck features Calibration Deep neural network Mismatch Speaker recognition Alignment Calibration Speech communication Speech processing Bottleneck features Computationally efficient Deep neural networks Discriminative power Mismatch Speaker recognition Speaker recognition system Universal background model Speech recognition This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment traditional features with those extracted from a bottleneck layer of the DNN. These techniques provide extremely good performance for constrained trial conditions that are well matched to development conditions. However, when applied to unseen conditions or a wide variety of conditions, some DNN-based techniques offer poor calibration performance. Through analysis on both PRISM and the recently released Speakers in the Wild (SITW) corpora, we illustrate that bottleneck features hinder calibration if used in the calculation of first-order Baum Welch statistics during i-vector extraction. We propose a hybrid alignment framework, which stems from our previous work in DNN senone alignment, that uses the bottleneck features only for the alignment of features during statistics calculation. This framework not only addresses the issue of calibration, but provides a more computationally efficient system based on bottleneck features with improved discriminative power. Copyright © 2016 ISCA. CONF info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_2308457X_v08-12-September-2016_n_p1825_McLaren
institution	Universidad de Buenos Aires
institution_str	I-28
repository_str	R-134
collection	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic	Bottleneck features Calibration Deep neural network Mismatch Speaker recognition Alignment Calibration Speech communication Speech processing Bottleneck features Computationally efficient Deep neural networks Discriminative power Mismatch Speaker recognition Speaker recognition system Universal background model Speech recognition
spellingShingle	Bottleneck features Calibration Deep neural network Mismatch Speaker recognition Alignment Calibration Speech communication Speech processing Bottleneck features Computationally efficient Deep neural networks Discriminative power Mismatch Speaker recognition Speaker recognition system Universal background model Speech recognition McLaren, M. Castan, D. Ferrer, L. Lawson, A. Morgan N. Georgiou P. Morgan N. Narayanan S. Metze F. Amazon Alexa; Apple; eBay; et al.; Google; Microsoft On the issue of calibration in DNN-based speaker recognition systems
topic_facet	Bottleneck features Calibration Deep neural network Mismatch Speaker recognition Alignment Calibration Speech communication Speech processing Bottleneck features Computationally efficient Deep neural networks Discriminative power Mismatch Speaker recognition Speaker recognition system Universal background model Speech recognition
description	This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment traditional features with those extracted from a bottleneck layer of the DNN. These techniques provide extremely good performance for constrained trial conditions that are well matched to development conditions. However, when applied to unseen conditions or a wide variety of conditions, some DNN-based techniques offer poor calibration performance. Through analysis on both PRISM and the recently released Speakers in the Wild (SITW) corpora, we illustrate that bottleneck features hinder calibration if used in the calculation of first-order Baum Welch statistics during i-vector extraction. We propose a hybrid alignment framework, which stems from our previous work in DNN senone alignment, that uses the bottleneck features only for the alignment of features during statistics calculation. This framework not only addresses the issue of calibration, but provides a more computationally efficient system based on bottleneck features with improved discriminative power. Copyright © 2016 ISCA.
format	CONF
author	McLaren, M. Castan, D. Ferrer, L. Lawson, A. Morgan N. Georgiou P. Morgan N. Narayanan S. Metze F. Amazon Alexa; Apple; eBay; et al.; Google; Microsoft
author_facet	McLaren, M. Castan, D. Ferrer, L. Lawson, A. Morgan N. Georgiou P. Morgan N. Narayanan S. Metze F. Amazon Alexa; Apple; eBay; et al.; Google; Microsoft
author_sort	McLaren, M.
title	On the issue of calibration in DNN-based speaker recognition systems
title_short	On the issue of calibration in DNN-based speaker recognition systems
title_full	On the issue of calibration in DNN-based speaker recognition systems
title_fullStr	On the issue of calibration in DNN-based speaker recognition systems
title_full_unstemmed	On the issue of calibration in DNN-based speaker recognition systems
title_sort	on the issue of calibration in dnn-based speaker recognition systems
url	http://hdl.handle.net/20.500.12110/paper_2308457X_v08-12-September-2016_n_p1825_McLaren
work_keys_str_mv	AT mclarenm ontheissueofcalibrationindnnbasedspeakerrecognitionsystems AT castand ontheissueofcalibrationindnnbasedspeakerrecognitionsystems AT ferrerl ontheissueofcalibrationindnnbasedspeakerrecognitionsystems AT lawsona ontheissueofcalibrationindnnbasedspeakerrecognitionsystems AT morgann ontheissueofcalibrationindnnbasedspeakerrecognitionsystems AT georgioup ontheissueofcalibrationindnnbasedspeakerrecognitionsystems AT morgann ontheissueofcalibrationindnnbasedspeakerrecognitionsystems AT narayanans ontheissueofcalibrationindnnbasedspeakerrecognitionsystems AT metzef ontheissueofcalibrationindnnbasedspeakerrecognitionsystems AT amazonalexaappleebayetalgooglemicrosoft ontheissueofcalibrationindnnbasedspeakerrecognitionsystems
_version_	1807315544511086592

On the issue of calibration in DNN-based speaker recognition systems

Ejemplares similares