Improving robustness of speaker recognition to new conditions using unlabeled data

Unsupervised techniques for the adaptation of speaker recognition are important due to the problem of condition mismatch that is prevalent when applying speaker recognition technology to new conditions and the general scarcity of labeled 'in-domain' data. In the recent NIST 2016 Speaker Re...

Descripción completa

Guardado en:
Detalles Bibliográficos
Publicado: 2017
Materias:
Acceso en línea:https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v2017-August_n_p3737_Castan
http://hdl.handle.net/20.500.12110/paper_2308457X_v2017-August_n_p3737_Castan
Aporte de:
id paper:paper_2308457X_v2017-August_n_p3737_Castan
record_format dspace
spelling paper:paper_2308457X_v2017-August_n_p3737_Castan2023-06-08T16:35:33Z Improving robustness of speaker recognition to new conditions using unlabeled data NIST SRE16 Score Calibration Score Normalization Trial-based Calibration Calibration Speech communication Acoustic conditions Calibration parameters NIST SRE16 Score normalization Speaker clustering Speaker recognition Speaker recognition evaluations Unsupervised techniques Speech recognition Unsupervised techniques for the adaptation of speaker recognition are important due to the problem of condition mismatch that is prevalent when applying speaker recognition technology to new conditions and the general scarcity of labeled 'in-domain' data. In the recent NIST 2016 Speaker Recognition Evaluation (SRE), symmetric score normalization (Snorm) and calibration using unlabeled in-domain data were shown to be beneficial. Because calibration requires speaker labels for training, speaker-clustering techniques were used to generate pseudo-speakers for learning calibration parameters in those cases where only unlabeled in-domain data was available. These methods performed well in the SRE16. It is unclear, however, whether those techniques generalize well to other data sources. In this work, we benchmark these approaches on several distinctly different databases, after we describe our SRI-CON-UAM team system submission for the NIST 2016 SRE. Our analysis shows that while the benefit of S-norm is also observed across other datasets, applying speaker-clustered calibration provides considerably greater benefit to the system in the context of new acoustic conditions. Copyright © 2017 ISCA. 2017 https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v2017-August_n_p3737_Castan http://hdl.handle.net/20.500.12110/paper_2308457X_v2017-August_n_p3737_Castan
institution Universidad de Buenos Aires
institution_str I-28
repository_str R-134
collection Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic NIST SRE16
Score Calibration
Score Normalization
Trial-based Calibration
Calibration
Speech communication
Acoustic conditions
Calibration parameters
NIST SRE16
Score normalization
Speaker clustering
Speaker recognition
Speaker recognition evaluations
Unsupervised techniques
Speech recognition
spellingShingle NIST SRE16
Score Calibration
Score Normalization
Trial-based Calibration
Calibration
Speech communication
Acoustic conditions
Calibration parameters
NIST SRE16
Score normalization
Speaker clustering
Speaker recognition
Speaker recognition evaluations
Unsupervised techniques
Speech recognition
Improving robustness of speaker recognition to new conditions using unlabeled data
topic_facet NIST SRE16
Score Calibration
Score Normalization
Trial-based Calibration
Calibration
Speech communication
Acoustic conditions
Calibration parameters
NIST SRE16
Score normalization
Speaker clustering
Speaker recognition
Speaker recognition evaluations
Unsupervised techniques
Speech recognition
description Unsupervised techniques for the adaptation of speaker recognition are important due to the problem of condition mismatch that is prevalent when applying speaker recognition technology to new conditions and the general scarcity of labeled 'in-domain' data. In the recent NIST 2016 Speaker Recognition Evaluation (SRE), symmetric score normalization (Snorm) and calibration using unlabeled in-domain data were shown to be beneficial. Because calibration requires speaker labels for training, speaker-clustering techniques were used to generate pseudo-speakers for learning calibration parameters in those cases where only unlabeled in-domain data was available. These methods performed well in the SRE16. It is unclear, however, whether those techniques generalize well to other data sources. In this work, we benchmark these approaches on several distinctly different databases, after we describe our SRI-CON-UAM team system submission for the NIST 2016 SRE. Our analysis shows that while the benefit of S-norm is also observed across other datasets, applying speaker-clustered calibration provides considerably greater benefit to the system in the context of new acoustic conditions. Copyright © 2017 ISCA.
title Improving robustness of speaker recognition to new conditions using unlabeled data
title_short Improving robustness of speaker recognition to new conditions using unlabeled data
title_full Improving robustness of speaker recognition to new conditions using unlabeled data
title_fullStr Improving robustness of speaker recognition to new conditions using unlabeled data
title_full_unstemmed Improving robustness of speaker recognition to new conditions using unlabeled data
title_sort improving robustness of speaker recognition to new conditions using unlabeled data
publishDate 2017
url https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v2017-August_n_p3737_Castan
http://hdl.handle.net/20.500.12110/paper_2308457X_v2017-August_n_p3737_Castan
_version_ 1768544843304271872