Mitigating the effects of non-stationary unseen noises on language recognition performance

We introduce a new dataset for the study of the effect of highly non-stationary noises on language recognition (LR) performance. The dataset is based on the data from the 2009 Language Recognition Evaluation organized by the National Institute of Standards and Technology (NIST). Randomly selected no...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ferrer, L., McLaren, M., Lawson, A., Graciarena, M., Noth E., Steidl S., Moller S., Ney H., Mobius B., Alibaba Group; Amazon; et al.; Facebook; Google; Telekom Innovation Laboratories
Formato: CONF
Materias:
Acceso en línea:http://hdl.handle.net/20.500.12110/paper_2308457X_v2015-January_n_p3446_Ferrer
Aporte de:
id todo:paper_2308457X_v2015-January_n_p3446_Ferrer
record_format dspace
spelling todo:paper_2308457X_v2015-January_n_p3446_Ferrer2023-10-03T16:40:53Z Mitigating the effects of non-stationary unseen noises on language recognition performance Ferrer, L. McLaren, M. Lawson, A. Graciarena, M. Noth E. Steidl S. Moller S. Ney H. Mobius B. Alibaba Group; Amazon; et al.; Facebook; Google; Telekom Innovation Laboratories Non-stationary noise Speech activity detection Spoken language recognition Computational linguistics Signal detection Speech Speech communication Statistical tests Data performance Language recognition Model Adaptation National Institute of Standards and Technology Nonstationary noise Signaltonoise ratio (SNR) Speech activity detections Spoken language recognition Speech recognition We introduce a new dataset for the study of the effect of highly non-stationary noises on language recognition (LR) performance. The dataset is based on the data from the 2009 Language Recognition Evaluation organized by the National Institute of Standards and Technology (NIST). Randomly selected noises are added to these signals to achieve a chosen signal-tonoise ratio and percentage of corruption. We study the effect of these noises on LR performance as a function of these parameters and present some initial methods to mitigate the degradation, focusing on the speech activity detection (SAD) step. These methods include discarding the C0 coefficient from the features used for SAD, using a more stringent threshold on the SAD scores, thresholding the speech likelihoods returned by the model as an additional way of detecting noise, and a final model adaptation step. We show that a system optimized for clean speech is clearly suboptimal on this new dataset since the proposed methods lead to gains of up to 35% on the corrupted data, without knowledge of the test noises and with very little effect on clean data performance. Copyright © 2015 ISCA. CONF info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_2308457X_v2015-January_n_p3446_Ferrer
institution Universidad de Buenos Aires
institution_str I-28
repository_str R-134
collection Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic Non-stationary noise
Speech activity detection
Spoken language recognition
Computational linguistics
Signal detection
Speech
Speech communication
Statistical tests
Data performance
Language recognition
Model Adaptation
National Institute of Standards and Technology
Nonstationary noise
Signaltonoise ratio (SNR)
Speech activity detections
Spoken language recognition
Speech recognition
spellingShingle Non-stationary noise
Speech activity detection
Spoken language recognition
Computational linguistics
Signal detection
Speech
Speech communication
Statistical tests
Data performance
Language recognition
Model Adaptation
National Institute of Standards and Technology
Nonstationary noise
Signaltonoise ratio (SNR)
Speech activity detections
Spoken language recognition
Speech recognition
Ferrer, L.
McLaren, M.
Lawson, A.
Graciarena, M.
Noth E.
Steidl S.
Moller S.
Ney H.
Mobius B.
Alibaba Group; Amazon; et al.; Facebook; Google; Telekom Innovation Laboratories
Mitigating the effects of non-stationary unseen noises on language recognition performance
topic_facet Non-stationary noise
Speech activity detection
Spoken language recognition
Computational linguistics
Signal detection
Speech
Speech communication
Statistical tests
Data performance
Language recognition
Model Adaptation
National Institute of Standards and Technology
Nonstationary noise
Signaltonoise ratio (SNR)
Speech activity detections
Spoken language recognition
Speech recognition
description We introduce a new dataset for the study of the effect of highly non-stationary noises on language recognition (LR) performance. The dataset is based on the data from the 2009 Language Recognition Evaluation organized by the National Institute of Standards and Technology (NIST). Randomly selected noises are added to these signals to achieve a chosen signal-tonoise ratio and percentage of corruption. We study the effect of these noises on LR performance as a function of these parameters and present some initial methods to mitigate the degradation, focusing on the speech activity detection (SAD) step. These methods include discarding the C0 coefficient from the features used for SAD, using a more stringent threshold on the SAD scores, thresholding the speech likelihoods returned by the model as an additional way of detecting noise, and a final model adaptation step. We show that a system optimized for clean speech is clearly suboptimal on this new dataset since the proposed methods lead to gains of up to 35% on the corrupted data, without knowledge of the test noises and with very little effect on clean data performance. Copyright © 2015 ISCA.
format CONF
author Ferrer, L.
McLaren, M.
Lawson, A.
Graciarena, M.
Noth E.
Steidl S.
Moller S.
Ney H.
Mobius B.
Alibaba Group; Amazon; et al.; Facebook; Google; Telekom Innovation Laboratories
author_facet Ferrer, L.
McLaren, M.
Lawson, A.
Graciarena, M.
Noth E.
Steidl S.
Moller S.
Ney H.
Mobius B.
Alibaba Group; Amazon; et al.; Facebook; Google; Telekom Innovation Laboratories
author_sort Ferrer, L.
title Mitigating the effects of non-stationary unseen noises on language recognition performance
title_short Mitigating the effects of non-stationary unseen noises on language recognition performance
title_full Mitigating the effects of non-stationary unseen noises on language recognition performance
title_fullStr Mitigating the effects of non-stationary unseen noises on language recognition performance
title_full_unstemmed Mitigating the effects of non-stationary unseen noises on language recognition performance
title_sort mitigating the effects of non-stationary unseen noises on language recognition performance
url http://hdl.handle.net/20.500.12110/paper_2308457X_v2015-January_n_p3446_Ferrer
work_keys_str_mv AT ferrerl mitigatingtheeffectsofnonstationaryunseennoisesonlanguagerecognitionperformance
AT mclarenm mitigatingtheeffectsofnonstationaryunseennoisesonlanguagerecognitionperformance
AT lawsona mitigatingtheeffectsofnonstationaryunseennoisesonlanguagerecognitionperformance
AT graciarenam mitigatingtheeffectsofnonstationaryunseennoisesonlanguagerecognitionperformance
AT nothe mitigatingtheeffectsofnonstationaryunseennoisesonlanguagerecognitionperformance
AT steidls mitigatingtheeffectsofnonstationaryunseennoisesonlanguagerecognitionperformance
AT mollers mitigatingtheeffectsofnonstationaryunseennoisesonlanguagerecognitionperformance
AT neyh mitigatingtheeffectsofnonstationaryunseennoisesonlanguagerecognitionperformance
AT mobiusb mitigatingtheeffectsofnonstationaryunseennoisesonlanguagerecognitionperformance
AT alibabagroupamazonetalfacebookgoogletelekominnovationlaboratories mitigatingtheeffectsofnonstationaryunseennoisesonlanguagerecognitionperformance
_version_ 1807315159691034624