Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology

In recent years, Automatic Speech Recognition (ASR) services have performed notable progress in the research efforts of big companies such as Google and Amazon. However, the ASRs are still sensitive to the audio processing quality in other languages. To solve this issue, various speech enhancement a...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Orellana, Marcos, Jiménez Sarango, Ángel Alberto, Zambrano Martínez, Jorge Luis
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2022
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/140657
Aporte de:
id I19-R120-10915-140657
record_format dspace
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
Automatic Speech Recognition
Word Error Rate
Speech enhancement algorithms
Audio quality improvement
spellingShingle Ciencias Informáticas
Automatic Speech Recognition
Word Error Rate
Speech enhancement algorithms
Audio quality improvement
Orellana, Marcos
Jiménez Sarango, Ángel Alberto
Zambrano Martínez, Jorge Luis
Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
topic_facet Ciencias Informáticas
Automatic Speech Recognition
Word Error Rate
Speech enhancement algorithms
Audio quality improvement
description In recent years, Automatic Speech Recognition (ASR) services have performed notable progress in the research efforts of big companies such as Google and Amazon. However, the ASRs are still sensitive to the audio processing quality in other languages. To solve this issue, various speech enhancement algorithms that are the most prominent in improving speech intelligibility were proposed, such as Singular Value Decomposition (SVD), log Minimum Mean Square Error (log-MMSE) and Wiener. By preprocessing the audio files with these algorithms, we seek to reduce the Word Error Rate (WER), which compares the transcription performed by the ASR against a manual transcription. Thus, we can determine the percentage of error that the ASR service has acquired. Results demonstrated that Google is more efficient than Amazon and Vosk counterparts. Also, we decided that applying a Low-pass filter combined with a log-MMSE algorithm to the audio files can substantially reduce the WER percentage of transcription depending on the noise characteristics contained in the audio.
format Objeto de conferencia
Objeto de conferencia
author Orellana, Marcos
Jiménez Sarango, Ángel Alberto
Zambrano Martínez, Jorge Luis
author_facet Orellana, Marcos
Jiménez Sarango, Ángel Alberto
Zambrano Martínez, Jorge Luis
author_sort Orellana, Marcos
title Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
title_short Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
title_full Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
title_fullStr Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
title_full_unstemmed Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
title_sort improving audio of emergency calls in spanish performed to the ecu 911 through filters for asr technology
publishDate 2022
url http://sedici.unlp.edu.ar/handle/10915/140657
work_keys_str_mv AT orellanamarcos improvingaudioofemergencycallsinspanishperformedtotheecu911throughfiltersforasrtechnology
AT jimenezsarangoangelalberto improvingaudioofemergencycallsinspanishperformedtotheecu911throughfiltersforasrtechnology
AT zambranomartinezjorgeluis improvingaudioofemergencycallsinspanishperformedtotheecu911throughfiltersforasrtechnology
bdutipo_str Repositorios
_version_ 1764820459311857664