Mitigating the effects of non-stationary unseen noises on language recognition performance
We introduce a new dataset for the study of the effect of highly non-stationary noises on language recognition (LR) performance. The dataset is based on the data from the 2009 Language Recognition Evaluation organized by the National Institute of Standards and Technology (NIST). Randomly selected no...
Guardado en:
Publicado: |
2015
|
---|---|
Materias: | |
Acceso en línea: | https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v2015-January_n_p3446_Ferrer http://hdl.handle.net/20.500.12110/paper_2308457X_v2015-January_n_p3446_Ferrer |
Aporte de: |
id |
paper:paper_2308457X_v2015-January_n_p3446_Ferrer |
---|---|
record_format |
dspace |
spelling |
paper:paper_2308457X_v2015-January_n_p3446_Ferrer2023-06-08T16:35:31Z Mitigating the effects of non-stationary unseen noises on language recognition performance Non-stationary noise Speech activity detection Spoken language recognition Computational linguistics Signal detection Speech Speech communication Statistical tests Data performance Language recognition Model Adaptation National Institute of Standards and Technology Nonstationary noise Signaltonoise ratio (SNR) Speech activity detections Spoken language recognition Speech recognition We introduce a new dataset for the study of the effect of highly non-stationary noises on language recognition (LR) performance. The dataset is based on the data from the 2009 Language Recognition Evaluation organized by the National Institute of Standards and Technology (NIST). Randomly selected noises are added to these signals to achieve a chosen signal-tonoise ratio and percentage of corruption. We study the effect of these noises on LR performance as a function of these parameters and present some initial methods to mitigate the degradation, focusing on the speech activity detection (SAD) step. These methods include discarding the C0 coefficient from the features used for SAD, using a more stringent threshold on the SAD scores, thresholding the speech likelihoods returned by the model as an additional way of detecting noise, and a final model adaptation step. We show that a system optimized for clean speech is clearly suboptimal on this new dataset since the proposed methods lead to gains of up to 35% on the corrupted data, without knowledge of the test noises and with very little effect on clean data performance. Copyright © 2015 ISCA. 2015 https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v2015-January_n_p3446_Ferrer http://hdl.handle.net/20.500.12110/paper_2308457X_v2015-January_n_p3446_Ferrer |
institution |
Universidad de Buenos Aires |
institution_str |
I-28 |
repository_str |
R-134 |
collection |
Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) |
topic |
Non-stationary noise Speech activity detection Spoken language recognition Computational linguistics Signal detection Speech Speech communication Statistical tests Data performance Language recognition Model Adaptation National Institute of Standards and Technology Nonstationary noise Signaltonoise ratio (SNR) Speech activity detections Spoken language recognition Speech recognition |
spellingShingle |
Non-stationary noise Speech activity detection Spoken language recognition Computational linguistics Signal detection Speech Speech communication Statistical tests Data performance Language recognition Model Adaptation National Institute of Standards and Technology Nonstationary noise Signaltonoise ratio (SNR) Speech activity detections Spoken language recognition Speech recognition Mitigating the effects of non-stationary unseen noises on language recognition performance |
topic_facet |
Non-stationary noise Speech activity detection Spoken language recognition Computational linguistics Signal detection Speech Speech communication Statistical tests Data performance Language recognition Model Adaptation National Institute of Standards and Technology Nonstationary noise Signaltonoise ratio (SNR) Speech activity detections Spoken language recognition Speech recognition |
description |
We introduce a new dataset for the study of the effect of highly non-stationary noises on language recognition (LR) performance. The dataset is based on the data from the 2009 Language Recognition Evaluation organized by the National Institute of Standards and Technology (NIST). Randomly selected noises are added to these signals to achieve a chosen signal-tonoise ratio and percentage of corruption. We study the effect of these noises on LR performance as a function of these parameters and present some initial methods to mitigate the degradation, focusing on the speech activity detection (SAD) step. These methods include discarding the C0 coefficient from the features used for SAD, using a more stringent threshold on the SAD scores, thresholding the speech likelihoods returned by the model as an additional way of detecting noise, and a final model adaptation step. We show that a system optimized for clean speech is clearly suboptimal on this new dataset since the proposed methods lead to gains of up to 35% on the corrupted data, without knowledge of the test noises and with very little effect on clean data performance. Copyright © 2015 ISCA. |
title |
Mitigating the effects of non-stationary unseen noises on language recognition performance |
title_short |
Mitigating the effects of non-stationary unseen noises on language recognition performance |
title_full |
Mitigating the effects of non-stationary unseen noises on language recognition performance |
title_fullStr |
Mitigating the effects of non-stationary unseen noises on language recognition performance |
title_full_unstemmed |
Mitigating the effects of non-stationary unseen noises on language recognition performance |
title_sort |
mitigating the effects of non-stationary unseen noises on language recognition performance |
publishDate |
2015 |
url |
https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_2308457X_v2015-January_n_p3446_Ferrer http://hdl.handle.net/20.500.12110/paper_2308457X_v2015-January_n_p3446_Ferrer |
_version_ |
1768542524895395840 |