Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
In recent years, Automatic Speech Recognition (ASR) services have performed notable progress in the research efforts of big companies such as Google and Amazon. However, the ASRs are still sensitive to the audio processing quality in other languages. To solve this issue, various speech enhancement a...
Autores principales: | , , |
---|---|
Formato: | Objeto de conferencia |
Lenguaje: | Inglés |
Publicado: |
2022
|
Materias: | |
Acceso en línea: | http://sedici.unlp.edu.ar/handle/10915/140657 |
Aporte de: |
id |
I19-R120-10915-140657 |
---|---|
record_format |
dspace |
institution |
Universidad Nacional de La Plata |
institution_str |
I-19 |
repository_str |
R-120 |
collection |
SEDICI (UNLP) |
language |
Inglés |
topic |
Ciencias Informáticas Automatic Speech Recognition Word Error Rate Speech enhancement algorithms Audio quality improvement |
spellingShingle |
Ciencias Informáticas Automatic Speech Recognition Word Error Rate Speech enhancement algorithms Audio quality improvement Orellana, Marcos Jiménez Sarango, Ángel Alberto Zambrano Martínez, Jorge Luis Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology |
topic_facet |
Ciencias Informáticas Automatic Speech Recognition Word Error Rate Speech enhancement algorithms Audio quality improvement |
description |
In recent years, Automatic Speech Recognition (ASR) services have performed notable progress in the research efforts of big companies such as Google and Amazon. However, the ASRs are still sensitive to the audio processing quality in other languages. To solve this issue, various speech enhancement algorithms that are the most prominent in improving speech intelligibility were proposed, such as Singular Value Decomposition (SVD), log Minimum Mean Square Error (log-MMSE) and Wiener. By preprocessing the audio files with these algorithms, we seek to reduce the Word Error Rate (WER), which compares the transcription performed by the ASR against a manual transcription. Thus, we can determine the percentage of error that the ASR service has acquired.
Results demonstrated that Google is more efficient than Amazon and Vosk counterparts. Also, we decided that applying a Low-pass filter combined with a log-MMSE algorithm to the audio files can substantially reduce the WER percentage of transcription depending on the noise characteristics contained in the audio. |
format |
Objeto de conferencia Objeto de conferencia |
author |
Orellana, Marcos Jiménez Sarango, Ángel Alberto Zambrano Martínez, Jorge Luis |
author_facet |
Orellana, Marcos Jiménez Sarango, Ángel Alberto Zambrano Martínez, Jorge Luis |
author_sort |
Orellana, Marcos |
title |
Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology |
title_short |
Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology |
title_full |
Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology |
title_fullStr |
Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology |
title_full_unstemmed |
Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology |
title_sort |
improving audio of emergency calls in spanish performed to the ecu 911 through filters for asr technology |
publishDate |
2022 |
url |
http://sedici.unlp.edu.ar/handle/10915/140657 |
work_keys_str_mv |
AT orellanamarcos improvingaudioofemergencycallsinspanishperformedtotheecu911throughfiltersforasrtechnology AT jimenezsarangoangelalberto improvingaudioofemergencycallsinspanishperformedtotheecu911throughfiltersforasrtechnology AT zambranomartinezjorgeluis improvingaudioofemergencycallsinspanishperformedtotheecu911throughfiltersforasrtechnology |
bdutipo_str |
Repositorios |
_version_ |
1764820459311857664 |