Exploiting user-frequency information for mining regionalisms in Argentinian Spanish from Twitter
The task of detecting regionalisms (expressions or words used in certain regions) has traditionally relied on the use of questionnaires and surveys, heavily depending on the expertise and intuition of the surveyor. The emergence of social media and microblogging services has produced an unprecede...
Guardado en:
Autores principales: | , , , |
---|---|
Formato: | Artículo publishedVersion |
Lenguaje: | Español |
Publicado: |
Procesamiento del Lenguaje Natural, Revista
2022
|
Materias: | |
Acceso en línea: | https://repositorio.utdt.edu/handle/20.500.13098/11442 http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6427 |
Aporte de: |
id |
I57-R163-20.500.13098-11442 |
---|---|
record_format |
dspace |
institution |
Universidad Torcuato Di Tella |
institution_str |
I-57 |
repository_str |
R-163 |
collection |
Repositorio Digital Universidad Torcuato Di Tella |
language |
Español |
orig_language_str_mv |
spa |
topic |
Lexical dialectology Social media Spanish variants Entropy |
spellingShingle |
Lexical dialectology Social media Spanish variants Entropy Gravano, Agustín Pérez, Juan Manuel Aleman, Damian E. Kalinowski, Santiago N. Exploiting user-frequency information for mining regionalisms in Argentinian Spanish from Twitter |
topic_facet |
Lexical dialectology Social media Spanish variants Entropy |
description |
The task of detecting regionalisms (expressions or words used in certain
regions) has traditionally relied on the use of questionnaires and surveys, heavily
depending on the expertise and intuition of the surveyor. The emergence of social
media and microblogging services has produced an unprecedented wealth of content
(mainly informal text generated by users), opening new opportunities for linguists
to extend their studies of language variation. Previous work on the automatic detection
of regionalisms depended mostly on word frequencies. In this work, we present
a novel metric based on Information Theory that incorporates user frequency. We
tested this metric on a corpus of Argentinian Spanish tweets in two ways: via manual
annotation of the relevance of the retrieved terms, and also as a feature selection
method for geolocation of users. In either case, our metric outperformed other techniques
based on word frequency, suggesting that measuring the amount of users that
use a word is an informative feature. This tool has helped lexicographers discover
several unregistered words of Argentinian Spanish, as well as di erent meanings assigned
to registered words. |
format |
Artículo publishedVersion |
author |
Gravano, Agustín Pérez, Juan Manuel Aleman, Damian E. Kalinowski, Santiago N. |
author_facet |
Gravano, Agustín Pérez, Juan Manuel Aleman, Damian E. Kalinowski, Santiago N. |
author_sort |
Gravano, Agustín |
title |
Exploiting user-frequency information for mining regionalisms in Argentinian Spanish from Twitter |
title_short |
Exploiting user-frequency information for mining regionalisms in Argentinian Spanish from Twitter |
title_full |
Exploiting user-frequency information for mining regionalisms in Argentinian Spanish from Twitter |
title_fullStr |
Exploiting user-frequency information for mining regionalisms in Argentinian Spanish from Twitter |
title_full_unstemmed |
Exploiting user-frequency information for mining regionalisms in Argentinian Spanish from Twitter |
title_sort |
exploiting user-frequency information for mining regionalisms in argentinian spanish from twitter |
publisher |
Procesamiento del Lenguaje Natural, Revista |
publishDate |
2022 |
url |
https://repositorio.utdt.edu/handle/20.500.13098/11442 http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6427 |
work_keys_str_mv |
AT gravanoagustin exploitinguserfrequencyinformationforminingregionalismsinargentinianspanishfromtwitter AT perezjuanmanuel exploitinguserfrequencyinformationforminingregionalismsinargentinianspanishfromtwitter AT alemandamiane exploitinguserfrequencyinformationforminingregionalismsinargentinianspanishfromtwitter AT kalinowskisantiagon exploitinguserfrequencyinformationforminingregionalismsinargentinianspanishfromtwitter |
bdutipo_str |
Repositorios |
_version_ |
1764820542380048386 |