Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems

Mostrar todas las versiones(2)

We present a system for detection of lexical stress in English words spoken by English learners. This system was designed to be part of the EduSpeak® computer-assisted language learning (CALL) software. The system uses both prosodic and spectral features to detect the level of stress (unstressed, pr...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Ferrer, L., Bratt, H., Richey, C., Franco, H., Abrash, V., Precoda, K.
Formato:	JOUR
Materias:	Computer-assisted language learning Gaussian mixture models Lexical stress detection Mel frequency cepstral coefficients Prosodic features Computational linguistics Computer aided instruction Consumer products E-learning Feature extraction Linguistics Probability Speech recognition Computer assisted language learning Gaussian Mixture Model Mel frequency cepstral co-efficient Stress detection Learning systems
Acceso en línea:	http://hdl.handle.net/20.500.12110/paper_01676393_v69_n_p31_Ferrer
Aporte de:	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) de Universidad de Buenos Aires

id	todo:paper_01676393_v69_n_p31_Ferrer
record_format	dspace
spelling	todo:paper_01676393_v69_n_p31_Ferrer2023-10-03T15:05:01Z Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems Ferrer, L. Bratt, H. Richey, C. Franco, H. Abrash, V. Precoda, K. Computer-assisted language learning Gaussian mixture models Lexical stress detection Mel frequency cepstral coefficients Prosodic features Computational linguistics Computer aided instruction Consumer products E-learning Feature extraction Linguistics Probability Speech recognition Computer assisted language learning Gaussian Mixture Model Mel frequency cepstral co-efficient Prosodic features Stress detection Learning systems We present a system for detection of lexical stress in English words spoken by English learners. This system was designed to be part of the EduSpeak® computer-assisted language learning (CALL) software. The system uses both prosodic and spectral features to detect the level of stress (unstressed, primary or secondary) for each syllable in a word. Features are computed on the vowels and include normalized energy, pitch, spectral tilt, and duration measurements, as well as log-posterior probabilities obtained from the frame-level mel-frequency cepstral coefficients (MFCCs). Gaussian mixture models (GMMs) are used to represent the distribution of these features for each stress class. The system is trained on utterances by L1-English children and tested on English speech from L1-English children and L1-Japanese children with variable levels of English proficiency. Since it is trained on data from L1-English speakers, the system can be used on English utterances spoken by speakers of any L1 without retraining. Furthermore, automatically determined stress patterns are used as the intended target; therefore, hand-labeling of training data is not required. This allows us to use a large amount of data for training the system. Our algorithm results in an error rate of approximately 11% on English utterances from L1-English speakers and 20% on English utterances from L1-Japanese speakers. We show that all features, both spectral and prosodic, are necessary for achievement of optimal performance on the data from L1-English speakers; MFCC log-posterior probability features are the single best set of features, followed by duration, energy, pitch and finally, spectral tilt features. For English utterances from L1-Japanese speakers, energy, MFCC log-posterior probabilities and duration are the most important features. © 2015 Elsevier B.V. JOUR info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_01676393_v69_n_p31_Ferrer
institution	Universidad de Buenos Aires
institution_str	I-28
repository_str	R-134
collection	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic	Computer-assisted language learning Gaussian mixture models Lexical stress detection Mel frequency cepstral coefficients Prosodic features Computational linguistics Computer aided instruction Consumer products E-learning Feature extraction Linguistics Probability Speech recognition Computer assisted language learning Gaussian Mixture Model Mel frequency cepstral co-efficient Prosodic features Stress detection Learning systems
spellingShingle	Computer-assisted language learning Gaussian mixture models Lexical stress detection Mel frequency cepstral coefficients Prosodic features Computational linguistics Computer aided instruction Consumer products E-learning Feature extraction Linguistics Probability Speech recognition Computer assisted language learning Gaussian Mixture Model Mel frequency cepstral co-efficient Prosodic features Stress detection Learning systems Ferrer, L. Bratt, H. Richey, C. Franco, H. Abrash, V. Precoda, K. Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems
topic_facet	Computer-assisted language learning Gaussian mixture models Lexical stress detection Mel frequency cepstral coefficients Prosodic features Computational linguistics Computer aided instruction Consumer products E-learning Feature extraction Linguistics Probability Speech recognition Computer assisted language learning Gaussian Mixture Model Mel frequency cepstral co-efficient Prosodic features Stress detection Learning systems
description	We present a system for detection of lexical stress in English words spoken by English learners. This system was designed to be part of the EduSpeak® computer-assisted language learning (CALL) software. The system uses both prosodic and spectral features to detect the level of stress (unstressed, primary or secondary) for each syllable in a word. Features are computed on the vowels and include normalized energy, pitch, spectral tilt, and duration measurements, as well as log-posterior probabilities obtained from the frame-level mel-frequency cepstral coefficients (MFCCs). Gaussian mixture models (GMMs) are used to represent the distribution of these features for each stress class. The system is trained on utterances by L1-English children and tested on English speech from L1-English children and L1-Japanese children with variable levels of English proficiency. Since it is trained on data from L1-English speakers, the system can be used on English utterances spoken by speakers of any L1 without retraining. Furthermore, automatically determined stress patterns are used as the intended target; therefore, hand-labeling of training data is not required. This allows us to use a large amount of data for training the system. Our algorithm results in an error rate of approximately 11% on English utterances from L1-English speakers and 20% on English utterances from L1-Japanese speakers. We show that all features, both spectral and prosodic, are necessary for achievement of optimal performance on the data from L1-English speakers; MFCC log-posterior probability features are the single best set of features, followed by duration, energy, pitch and finally, spectral tilt features. For English utterances from L1-Japanese speakers, energy, MFCC log-posterior probabilities and duration are the most important features. © 2015 Elsevier B.V.
format	JOUR
author	Ferrer, L. Bratt, H. Richey, C. Franco, H. Abrash, V. Precoda, K.
author_facet	Ferrer, L. Bratt, H. Richey, C. Franco, H. Abrash, V. Precoda, K.
author_sort	Ferrer, L.
title	Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems
title_short	Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems
title_full	Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems
title_fullStr	Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems
title_full_unstemmed	Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems
title_sort	classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems
url	http://hdl.handle.net/20.500.12110/paper_01676393_v69_n_p31_Ferrer
work_keys_str_mv	AT ferrerl classificationoflexicalstressusingspectralandprosodicfeaturesforcomputerassistedlanguagelearningsystems AT bratth classificationoflexicalstressusingspectralandprosodicfeaturesforcomputerassistedlanguagelearningsystems AT richeyc classificationoflexicalstressusingspectralandprosodicfeaturesforcomputerassistedlanguagelearningsystems AT francoh classificationoflexicalstressusingspectralandprosodicfeaturesforcomputerassistedlanguagelearningsystems AT abrashv classificationoflexicalstressusingspectralandprosodicfeaturesforcomputerassistedlanguagelearningsystems AT precodak classificationoflexicalstressusingspectralandprosodicfeaturesforcomputerassistedlanguagelearningsystems
_version_	1807321602106327040

Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems

Ejemplares similares