A phonetically aware system for speech activity detection

Mostrar todas las versiones(2)

Speech activity detection (SAD) is an essential component of most speech processing tasks and greatly influences the performance of the systems. Noise and channel distortions remain a challenge for SAD systems. In this paper, we focus on a dataset of highly degraded signals, developed under the DARP...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Ferrer, L., Graciarena, M., Mitra, V., The Institute of Electrical and Electronics Engineers Signal Processing Society
Formato:	CONF
Materias:	bottleneck features deep neural networks degraded channels Speech activity detection
Acceso en línea:	http://hdl.handle.net/20.500.12110/paper_15206149_v2016-May_n_p5710_Ferrer
Aporte de:	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) de Universidad de Buenos Aires

id	todo:paper_15206149_v2016-May_n_p5710_Ferrer
record_format	dspace
spelling	todo:paper_15206149_v2016-May_n_p5710_Ferrer2023-10-03T16:20:33Z A phonetically aware system for speech activity detection Ferrer, L. Graciarena, M. Mitra, V. The Institute of Electrical and Electronics Engineers Signal Processing Society bottleneck features deep neural networks degraded channels Speech activity detection Speech activity detection (SAD) is an essential component of most speech processing tasks and greatly influences the performance of the systems. Noise and channel distortions remain a challenge for SAD systems. In this paper, we focus on a dataset of highly degraded signals, developed under the DARPA Robust Automatic Transcription of Speech (RATS) program. On this challenging data, the best-performing systems are those based on deep neural networks (DNN) trained to predict speech/non-speech posteriors for each frame. We propose a novel two-stage approach to SAD that attempts to model phonetic information in the signal more explicitly than in current systems. In the first stage, a bottleneck DNN is trained to predict posteriors for senones. The activations at the bottleneck layer are then used as input to a second DNN, trained to predict the speech/non-speech posteriors. We test performance on two datasets, with matched and mismatched channels compared to those in the training data. On the matched channels, the proposed approach leads to gains of approximately 35% relative to our best single-stage DNN SAD system. On mismatched channels, the proposed system obtains comparable performance to our baseline, indicating more work needs to be done to improve robustness to mismatched data. © 2016 IEEE. CONF info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_15206149_v2016-May_n_p5710_Ferrer
institution	Universidad de Buenos Aires
institution_str	I-28
repository_str	R-134
collection	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic	bottleneck features deep neural networks degraded channels Speech activity detection
spellingShingle	bottleneck features deep neural networks degraded channels Speech activity detection Ferrer, L. Graciarena, M. Mitra, V. The Institute of Electrical and Electronics Engineers Signal Processing Society A phonetically aware system for speech activity detection
topic_facet	bottleneck features deep neural networks degraded channels Speech activity detection
description	Speech activity detection (SAD) is an essential component of most speech processing tasks and greatly influences the performance of the systems. Noise and channel distortions remain a challenge for SAD systems. In this paper, we focus on a dataset of highly degraded signals, developed under the DARPA Robust Automatic Transcription of Speech (RATS) program. On this challenging data, the best-performing systems are those based on deep neural networks (DNN) trained to predict speech/non-speech posteriors for each frame. We propose a novel two-stage approach to SAD that attempts to model phonetic information in the signal more explicitly than in current systems. In the first stage, a bottleneck DNN is trained to predict posteriors for senones. The activations at the bottleneck layer are then used as input to a second DNN, trained to predict the speech/non-speech posteriors. We test performance on two datasets, with matched and mismatched channels compared to those in the training data. On the matched channels, the proposed approach leads to gains of approximately 35% relative to our best single-stage DNN SAD system. On mismatched channels, the proposed system obtains comparable performance to our baseline, indicating more work needs to be done to improve robustness to mismatched data. © 2016 IEEE.
format	CONF
author	Ferrer, L. Graciarena, M. Mitra, V. The Institute of Electrical and Electronics Engineers Signal Processing Society
author_facet	Ferrer, L. Graciarena, M. Mitra, V. The Institute of Electrical and Electronics Engineers Signal Processing Society
author_sort	Ferrer, L.
title	A phonetically aware system for speech activity detection
title_short	A phonetically aware system for speech activity detection
title_full	A phonetically aware system for speech activity detection
title_fullStr	A phonetically aware system for speech activity detection
title_full_unstemmed	A phonetically aware system for speech activity detection
title_sort	phonetically aware system for speech activity detection
url	http://hdl.handle.net/20.500.12110/paper_15206149_v2016-May_n_p5710_Ferrer
work_keys_str_mv	AT ferrerl aphoneticallyawaresystemforspeechactivitydetection AT graciarenam aphoneticallyawaresystemforspeechactivitydetection AT mitrav aphoneticallyawaresystemforspeechactivitydetection AT theinstituteofelectricalandelectronicsengineerssignalprocessingsociety aphoneticallyawaresystemforspeechactivitydetection AT ferrerl phoneticallyawaresystemforspeechactivitydetection AT graciarenam phoneticallyawaresystemforspeechactivitydetection AT mitrav phoneticallyawaresystemforspeechactivitydetection AT theinstituteofelectricalandelectronicsengineerssignalprocessingsociety phoneticallyawaresystemforspeechactivitydetection
_version_	1807323425573699584

A phonetically aware system for speech activity detection

Ejemplares similares