Minimizing annotation effort for adaptation of speech-activity detection systems

Annotating audio data for the presence and location of speech is a time-consuming and therefore costly task. This is mostly because annotation precision greatly affects the performance of the speech-activity detection (SAD) systems trained with this data, which means that the annotation process must...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Ferrer, L., Graciarena, M., Morgan N., Georgiou P., Narayanan S., Metze F., Amazon Alexa; Apple; eBay; et al.; Google; Microsoft
Formato:	CONF
Materias:	Active learning Adaptation Annotation Speech-activity detection Artificial intelligence Budget control Speech Speech communication Speech processing Active Learning Audio samples Baseline systems Simple approach Speech activity detections Training data Speech recognition
Acceso en línea:	http://hdl.handle.net/20.500.12110/paper_2308457X_v08-12-September-2016_n_p3002_Ferrer
Aporte de:	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) de Universidad de Buenos Aires

id	todo:paper_2308457X_v08-12-September-2016_n_p3002_Ferrer
record_format	dspace
spelling	todo:paper_2308457X_v08-12-September-2016_n_p3002_Ferrer2023-10-03T16:40:51Z Minimizing annotation effort for adaptation of speech-activity detection systems Ferrer, L. Graciarena, M. Morgan N. Georgiou P. Morgan N. Narayanan S. Metze F. Amazon Alexa; Apple; eBay; et al.; Google; Microsoft Active learning Adaptation Annotation Speech-activity detection Artificial intelligence Budget control Speech Speech communication Speech processing Active Learning Adaptation Annotation Audio samples Baseline systems Simple approach Speech activity detections Training data Speech recognition Annotating audio data for the presence and location of speech is a time-consuming and therefore costly task. This is mostly because annotation precision greatly affects the performance of the speech-activity detection (SAD) systems trained with this data, which means that the annotation process must be careful and detailed. Although significant amounts of data are already annotated for speech presence and are available to train SAD systems, these systems are known to perform poorly on channels that are not well-represented by the training data. However obtaining representative audio samples from a new channel is relative easy and this data can be used for training a new SAD system or adapting one trained with larger amounts of mismatched data. This paper focuses on the problem of selecting the best-possible subset of available audio data given a budgeted time for annotation. We propose simple approaches for selection that lead to significant gains over na?ive methods that merely select N full files at random. An approach that uses the framelevel scores from a baseline system to select regions such that the score distribution is uniformly sampled gives the best tradeoff across a variety of channel groups. Copyright © 2016 ISCA. CONF info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_2308457X_v08-12-September-2016_n_p3002_Ferrer
institution	Universidad de Buenos Aires
institution_str	I-28
repository_str	R-134
collection	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic	Active learning Adaptation Annotation Speech-activity detection Artificial intelligence Budget control Speech Speech communication Speech processing Active Learning Adaptation Annotation Audio samples Baseline systems Simple approach Speech activity detections Training data Speech recognition
spellingShingle	Active learning Adaptation Annotation Speech-activity detection Artificial intelligence Budget control Speech Speech communication Speech processing Active Learning Adaptation Annotation Audio samples Baseline systems Simple approach Speech activity detections Training data Speech recognition Ferrer, L. Graciarena, M. Morgan N. Georgiou P. Morgan N. Narayanan S. Metze F. Amazon Alexa; Apple; eBay; et al.; Google; Microsoft Minimizing annotation effort for adaptation of speech-activity detection systems
topic_facet	Active learning Adaptation Annotation Speech-activity detection Artificial intelligence Budget control Speech Speech communication Speech processing Active Learning Adaptation Annotation Audio samples Baseline systems Simple approach Speech activity detections Training data Speech recognition
description	Annotating audio data for the presence and location of speech is a time-consuming and therefore costly task. This is mostly because annotation precision greatly affects the performance of the speech-activity detection (SAD) systems trained with this data, which means that the annotation process must be careful and detailed. Although significant amounts of data are already annotated for speech presence and are available to train SAD systems, these systems are known to perform poorly on channels that are not well-represented by the training data. However obtaining representative audio samples from a new channel is relative easy and this data can be used for training a new SAD system or adapting one trained with larger amounts of mismatched data. This paper focuses on the problem of selecting the best-possible subset of available audio data given a budgeted time for annotation. We propose simple approaches for selection that lead to significant gains over na?ive methods that merely select N full files at random. An approach that uses the framelevel scores from a baseline system to select regions such that the score distribution is uniformly sampled gives the best tradeoff across a variety of channel groups. Copyright © 2016 ISCA.
format	CONF
author	Ferrer, L. Graciarena, M. Morgan N. Georgiou P. Morgan N. Narayanan S. Metze F. Amazon Alexa; Apple; eBay; et al.; Google; Microsoft
author_facet	Ferrer, L. Graciarena, M. Morgan N. Georgiou P. Morgan N. Narayanan S. Metze F. Amazon Alexa; Apple; eBay; et al.; Google; Microsoft
author_sort	Ferrer, L.
title	Minimizing annotation effort for adaptation of speech-activity detection systems
title_short	Minimizing annotation effort for adaptation of speech-activity detection systems
title_full	Minimizing annotation effort for adaptation of speech-activity detection systems
title_fullStr	Minimizing annotation effort for adaptation of speech-activity detection systems
title_full_unstemmed	Minimizing annotation effort for adaptation of speech-activity detection systems
title_sort	minimizing annotation effort for adaptation of speech-activity detection systems
url	http://hdl.handle.net/20.500.12110/paper_2308457X_v08-12-September-2016_n_p3002_Ferrer
work_keys_str_mv	AT ferrerl minimizingannotationeffortforadaptationofspeechactivitydetectionsystems AT graciarenam minimizingannotationeffortforadaptationofspeechactivitydetectionsystems AT morgann minimizingannotationeffortforadaptationofspeechactivitydetectionsystems AT georgioup minimizingannotationeffortforadaptationofspeechactivitydetectionsystems AT morgann minimizingannotationeffortforadaptationofspeechactivitydetectionsystems AT narayanans minimizingannotationeffortforadaptationofspeechactivitydetectionsystems AT metzef minimizingannotationeffortforadaptationofspeechactivitydetectionsystems AT amazonalexaappleebayetalgooglemicrosoft minimizingannotationeffortforadaptationofspeechactivitydetectionsystems
_version_	1807322960144367616

Minimizing annotation effort for adaptation of speech-activity detection systems

Ejemplares similares