Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology

In recent years, Automatic Speech Recognition (ASR) services have performed notable progress in the research efforts of big companies such as Google and Amazon. However, the ASRs are still sensitive to the audio processing quality in other languages. To solve this issue, various speech enhancement a...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Orellana, Marcos, Jiménez Sarango, Ángel Alberto, Zambrano Martínez, Jorge Luis
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2022
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/140657
Aporte de:
Descripción
Sumario:In recent years, Automatic Speech Recognition (ASR) services have performed notable progress in the research efforts of big companies such as Google and Amazon. However, the ASRs are still sensitive to the audio processing quality in other languages. To solve this issue, various speech enhancement algorithms that are the most prominent in improving speech intelligibility were proposed, such as Singular Value Decomposition (SVD), log Minimum Mean Square Error (log-MMSE) and Wiener. By preprocessing the audio files with these algorithms, we seek to reduce the Word Error Rate (WER), which compares the transcription performed by the ASR against a manual transcription. Thus, we can determine the percentage of error that the ASR service has acquired. Results demonstrated that Google is more efficient than Amazon and Vosk counterparts. Also, we decided that applying a Low-pass filter combined with a log-MMSE algorithm to the audio files can substantially reduce the WER percentage of transcription depending on the noise characteristics contained in the audio.