On the issue of calibration in DNN-based speaker recognition systems

This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment tra...

Descripción completa

Detalles Bibliográficos
Autores principales: McLaren, M., Castan, D., Ferrer, L., Lawson, A., Morgan N., Georgiou P., Narayanan S., Metze F., Amazon Alexa; Apple; eBay; et al.; Google; Microsoft
Formato: CONF
Materias:
Acceso en línea:http://hdl.handle.net/20.500.12110/paper_2308457X_v08-12-September-2016_n_p1825_McLaren
Aporte de:
id todo:paper_2308457X_v08-12-September-2016_n_p1825_McLaren
record_format dspace
spelling todo:paper_2308457X_v08-12-September-2016_n_p1825_McLaren2023-10-03T16:40:51Z On the issue of calibration in DNN-based speaker recognition systems McLaren, M. Castan, D. Ferrer, L. Lawson, A. Morgan N. Georgiou P. Morgan N. Narayanan S. Metze F. Amazon Alexa; Apple; eBay; et al.; Google; Microsoft Bottleneck features Calibration Deep neural network Mismatch Speaker recognition Alignment Calibration Speech communication Speech processing Bottleneck features Computationally efficient Deep neural networks Discriminative power Mismatch Speaker recognition Speaker recognition system Universal background model Speech recognition This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment traditional features with those extracted from a bottleneck layer of the DNN. These techniques provide extremely good performance for constrained trial conditions that are well matched to development conditions. However, when applied to unseen conditions or a wide variety of conditions, some DNN-based techniques offer poor calibration performance. Through analysis on both PRISM and the recently released Speakers in the Wild (SITW) corpora, we illustrate that bottleneck features hinder calibration if used in the calculation of first-order Baum Welch statistics during i-vector extraction. We propose a hybrid alignment framework, which stems from our previous work in DNN senone alignment, that uses the bottleneck features only for the alignment of features during statistics calculation. This framework not only addresses the issue of calibration, but provides a more computationally efficient system based on bottleneck features with improved discriminative power. Copyright © 2016 ISCA. CONF info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_2308457X_v08-12-September-2016_n_p1825_McLaren
institution Universidad de Buenos Aires
institution_str I-28
repository_str R-134
collection Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic Bottleneck features
Calibration
Deep neural network
Mismatch
Speaker recognition
Alignment
Calibration
Speech communication
Speech processing
Bottleneck features
Computationally efficient
Deep neural networks
Discriminative power
Mismatch
Speaker recognition
Speaker recognition system
Universal background model
Speech recognition
spellingShingle Bottleneck features
Calibration
Deep neural network
Mismatch
Speaker recognition
Alignment
Calibration
Speech communication
Speech processing
Bottleneck features
Computationally efficient
Deep neural networks
Discriminative power
Mismatch
Speaker recognition
Speaker recognition system
Universal background model
Speech recognition
McLaren, M.
Castan, D.
Ferrer, L.
Lawson, A.
Morgan N.
Georgiou P.
Morgan N.
Narayanan S.
Metze F.
Amazon Alexa; Apple; eBay; et al.; Google; Microsoft
On the issue of calibration in DNN-based speaker recognition systems
topic_facet Bottleneck features
Calibration
Deep neural network
Mismatch
Speaker recognition
Alignment
Calibration
Speech communication
Speech processing
Bottleneck features
Computationally efficient
Deep neural networks
Discriminative power
Mismatch
Speaker recognition
Speaker recognition system
Universal background model
Speech recognition
description This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. DNNs have provided a new standard in technology when used in place of the traditional universal background model (UBM) for feature alignment, or to augment traditional features with those extracted from a bottleneck layer of the DNN. These techniques provide extremely good performance for constrained trial conditions that are well matched to development conditions. However, when applied to unseen conditions or a wide variety of conditions, some DNN-based techniques offer poor calibration performance. Through analysis on both PRISM and the recently released Speakers in the Wild (SITW) corpora, we illustrate that bottleneck features hinder calibration if used in the calculation of first-order Baum Welch statistics during i-vector extraction. We propose a hybrid alignment framework, which stems from our previous work in DNN senone alignment, that uses the bottleneck features only for the alignment of features during statistics calculation. This framework not only addresses the issue of calibration, but provides a more computationally efficient system based on bottleneck features with improved discriminative power. Copyright © 2016 ISCA.
format CONF
author McLaren, M.
Castan, D.
Ferrer, L.
Lawson, A.
Morgan N.
Georgiou P.
Morgan N.
Narayanan S.
Metze F.
Amazon Alexa; Apple; eBay; et al.; Google; Microsoft
author_facet McLaren, M.
Castan, D.
Ferrer, L.
Lawson, A.
Morgan N.
Georgiou P.
Morgan N.
Narayanan S.
Metze F.
Amazon Alexa; Apple; eBay; et al.; Google; Microsoft
author_sort McLaren, M.
title On the issue of calibration in DNN-based speaker recognition systems
title_short On the issue of calibration in DNN-based speaker recognition systems
title_full On the issue of calibration in DNN-based speaker recognition systems
title_fullStr On the issue of calibration in DNN-based speaker recognition systems
title_full_unstemmed On the issue of calibration in DNN-based speaker recognition systems
title_sort on the issue of calibration in dnn-based speaker recognition systems
url http://hdl.handle.net/20.500.12110/paper_2308457X_v08-12-September-2016_n_p1825_McLaren
work_keys_str_mv AT mclarenm ontheissueofcalibrationindnnbasedspeakerrecognitionsystems
AT castand ontheissueofcalibrationindnnbasedspeakerrecognitionsystems
AT ferrerl ontheissueofcalibrationindnnbasedspeakerrecognitionsystems
AT lawsona ontheissueofcalibrationindnnbasedspeakerrecognitionsystems
AT morgann ontheissueofcalibrationindnnbasedspeakerrecognitionsystems
AT georgioup ontheissueofcalibrationindnnbasedspeakerrecognitionsystems
AT morgann ontheissueofcalibrationindnnbasedspeakerrecognitionsystems
AT narayanans ontheissueofcalibrationindnnbasedspeakerrecognitionsystems
AT metzef ontheissueofcalibrationindnnbasedspeakerrecognitionsystems
AT amazonalexaappleebayetalgooglemicrosoft ontheissueofcalibrationindnnbasedspeakerrecognitionsystems
_version_ 1807315544511086592