A study on pose-based deep learning models for gloss-free sign language translation

Sign Language Translation (SLT) is a challenging task due to its cross-domain nature, different grammars and lack of data. Currently, many SLT models rely on intermediate gloss annotations as outputs or latent priors. Glosses can help models to correctly segment and align signs to better understand...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Dal Bianco, Pedro Alejandro, Ríos, Gastón Gustavo, Hasperué, Waldo, Stanchi, Oscar Agustín, Quiroga, Facundo Manuel, Ronchetti, Franco
Formato: Articulo
Lenguaje:Inglés
Publicado: 2024
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/173722
Aporte de:
id I19-R120-10915-173722
record_format dspace
spelling I19-R120-10915-1737222024-11-25T20:06:09Z http://sedici.unlp.edu.ar/handle/10915/173722 A study on pose-based deep learning models for gloss-free sign language translation Estudio sobre modelos de aprendizaje profundo basados en poses para traducción de lengua de señas sin glosas Dal Bianco, Pedro Alejandro Ríos, Gastón Gustavo Hasperué, Waldo Stanchi, Oscar Agustín Quiroga, Facundo Manuel Ronchetti, Franco 2024-10 2024-11-25T18:22:37Z en Ciencias Informáticas Deep Learning Gloss-free Pose Estimation Sign Language Datasets Sign Language Translation Bases de Datos de Lenguaje de Señas Estimación de Poses Lenguaje de Señas Libre de Glosas Traducción de Lenguaje de Señas Sign Language Translation (SLT) is a challenging task due to its cross-domain nature, different grammars and lack of data. Currently, many SLT models rely on intermediate gloss annotations as outputs or latent priors. Glosses can help models to correctly segment and align signs to better understand the video. However, the use of glosses comes with significant limitations, since obtaining annotations is quite difficult. Therefore, scaling gloss-based models to millions of samples remains impractical, specially considering the scarcity of sign language datasets. In a similar fashion, many models use video data that requires larger models which typically only work on high end GPUs, and are less invariant to signers appearance and context. In this work we propose a gloss-free pose-based SLT model. Using the extracted pose as feature allow for a sign significant reduction in the dimensionality of the data and the size of the model. We evaluate the state of the art, compare available models and develop a keypoint-based Transformer model for gloss-free SLT, trained on RWTH-Phoenix, a standard dataset for benchmarking SLT models alongside GSL, a simpler laboratory-made Greek Sign Language dataset. La Traducción de Lenguaje de Señas es una tarea desafiante ya que atraviesa múltiples dominios, diferentes gramáticas y falta de datos. Actualmente, muchos modelos de SLT dependen de glosas como anotaciones intermedias o salidas. Estas pueden ayudar a los modelos a segmentar y alinear correctamente las señas para comprender mejor el video. Sin embargo, su uso conlleva limitaciones significativas, ya que obtenerlas es bastante difícil. Por lo tanto, escalar modelos basados en glosas a millones de muestras sigue siendo impráctico, especialmente considerando la escasez de bases de datos de lengua de señas. De igual forma, muchos modelos utilizan videos como entrada, lo que requiere de modelos más grandes que típicamente solo funcionan en GPUs de alta gama y son menos invariantes a la apariencia y el contexto de los señantes. En este trabajo proponemos un modelo de SLT basado en poses y sin glosas. Usar la pose extraída como entrada permite una reducción significativa en la dimensionalidad de los datos y en el tamaño del modelo. Evaluamos el estado del arte, comparamos modelos disponibles y desarrollamos un modelo Transformer basado en keypoints para SLT sin glosas, entrenado sobre RWTH-Phoenix, un conjunto de datos estándar para la evaluación de modelos SLT, y sobre GSL, un conjunto de datos de Lengua de Señas Griega hecho en un laboratorio. Facultad de Informática Articulo Articulo http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) application/pdf 99-103
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
Deep Learning
Gloss-free
Pose Estimation
Sign Language Datasets
Sign Language Translation
Bases de Datos de Lenguaje de Señas
Estimación de Poses
Lenguaje de Señas
Libre de Glosas
Traducción de Lenguaje de Señas
spellingShingle Ciencias Informáticas
Deep Learning
Gloss-free
Pose Estimation
Sign Language Datasets
Sign Language Translation
Bases de Datos de Lenguaje de Señas
Estimación de Poses
Lenguaje de Señas
Libre de Glosas
Traducción de Lenguaje de Señas
Dal Bianco, Pedro Alejandro
Ríos, Gastón Gustavo
Hasperué, Waldo
Stanchi, Oscar Agustín
Quiroga, Facundo Manuel
Ronchetti, Franco
A study on pose-based deep learning models for gloss-free sign language translation
topic_facet Ciencias Informáticas
Deep Learning
Gloss-free
Pose Estimation
Sign Language Datasets
Sign Language Translation
Bases de Datos de Lenguaje de Señas
Estimación de Poses
Lenguaje de Señas
Libre de Glosas
Traducción de Lenguaje de Señas
description Sign Language Translation (SLT) is a challenging task due to its cross-domain nature, different grammars and lack of data. Currently, many SLT models rely on intermediate gloss annotations as outputs or latent priors. Glosses can help models to correctly segment and align signs to better understand the video. However, the use of glosses comes with significant limitations, since obtaining annotations is quite difficult. Therefore, scaling gloss-based models to millions of samples remains impractical, specially considering the scarcity of sign language datasets. In a similar fashion, many models use video data that requires larger models which typically only work on high end GPUs, and are less invariant to signers appearance and context. In this work we propose a gloss-free pose-based SLT model. Using the extracted pose as feature allow for a sign significant reduction in the dimensionality of the data and the size of the model. We evaluate the state of the art, compare available models and develop a keypoint-based Transformer model for gloss-free SLT, trained on RWTH-Phoenix, a standard dataset for benchmarking SLT models alongside GSL, a simpler laboratory-made Greek Sign Language dataset.
format Articulo
Articulo
author Dal Bianco, Pedro Alejandro
Ríos, Gastón Gustavo
Hasperué, Waldo
Stanchi, Oscar Agustín
Quiroga, Facundo Manuel
Ronchetti, Franco
author_facet Dal Bianco, Pedro Alejandro
Ríos, Gastón Gustavo
Hasperué, Waldo
Stanchi, Oscar Agustín
Quiroga, Facundo Manuel
Ronchetti, Franco
author_sort Dal Bianco, Pedro Alejandro
title A study on pose-based deep learning models for gloss-free sign language translation
title_short A study on pose-based deep learning models for gloss-free sign language translation
title_full A study on pose-based deep learning models for gloss-free sign language translation
title_fullStr A study on pose-based deep learning models for gloss-free sign language translation
title_full_unstemmed A study on pose-based deep learning models for gloss-free sign language translation
title_sort study on pose-based deep learning models for gloss-free sign language translation
publishDate 2024
url http://sedici.unlp.edu.ar/handle/10915/173722
work_keys_str_mv AT dalbiancopedroalejandro astudyonposebaseddeeplearningmodelsforglossfreesignlanguagetranslation
AT riosgastongustavo astudyonposebaseddeeplearningmodelsforglossfreesignlanguagetranslation
AT hasperuewaldo astudyonposebaseddeeplearningmodelsforglossfreesignlanguagetranslation
AT stanchioscaragustin astudyonposebaseddeeplearningmodelsforglossfreesignlanguagetranslation
AT quirogafacundomanuel astudyonposebaseddeeplearningmodelsforglossfreesignlanguagetranslation
AT ronchettifranco astudyonposebaseddeeplearningmodelsforglossfreesignlanguagetranslation
AT dalbiancopedroalejandro estudiosobremodelosdeaprendizajeprofundobasadosenposesparatraducciondelenguadesenassinglosas
AT riosgastongustavo estudiosobremodelosdeaprendizajeprofundobasadosenposesparatraducciondelenguadesenassinglosas
AT hasperuewaldo estudiosobremodelosdeaprendizajeprofundobasadosenposesparatraducciondelenguadesenassinglosas
AT stanchioscaragustin estudiosobremodelosdeaprendizajeprofundobasadosenposesparatraducciondelenguadesenassinglosas
AT quirogafacundomanuel estudiosobremodelosdeaprendizajeprofundobasadosenposesparatraducciondelenguadesenassinglosas
AT ronchettifranco estudiosobremodelosdeaprendizajeprofundobasadosenposesparatraducciondelenguadesenassinglosas
AT dalbiancopedroalejandro studyonposebaseddeeplearningmodelsforglossfreesignlanguagetranslation
AT riosgastongustavo studyonposebaseddeeplearningmodelsforglossfreesignlanguagetranslation
AT hasperuewaldo studyonposebaseddeeplearningmodelsforglossfreesignlanguagetranslation
AT stanchioscaragustin studyonposebaseddeeplearningmodelsforglossfreesignlanguagetranslation
AT quirogafacundomanuel studyonposebaseddeeplearningmodelsforglossfreesignlanguagetranslation
AT ronchettifranco studyonposebaseddeeplearningmodelsforglossfreesignlanguagetranslation
_version_ 1833157998866857984