On the Importance of Data Representation for the Success of Text Classification

Text mining approaches use natural language processing to automatically extract patterns from texts. Tasks as topic labeling, news classification, question answering, named entity recognition and sentiment analysis, usually require elaborate and effective document representations. In this context, w...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Cuello, Carolina Y., Jofre Caradonna, Vanessa, Garciarena Ucelay, María José, Cagnina, Leticia
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2022
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/149536
Aporte de:
id I19-R120-10915-149536
record_format dspace
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
text mining
text representations
text classification
movie reviews
sentiment analysis
polarity analysis
spellingShingle Ciencias Informáticas
text mining
text representations
text classification
movie reviews
sentiment analysis
polarity analysis
Cuello, Carolina Y.
Jofre Caradonna, Vanessa
Garciarena Ucelay, María José
Cagnina, Leticia
On the Importance of Data Representation for the Success of Text Classification
topic_facet Ciencias Informáticas
text mining
text representations
text classification
movie reviews
sentiment analysis
polarity analysis
description Text mining approaches use natural language processing to automatically extract patterns from texts. Tasks as topic labeling, news classification, question answering, named entity recognition and sentiment analysis, usually require elaborate and effective document representations. In this context, word representation models in general, and vector-based word representations in particular, have gained increasing interest to alleviate some of the limitations that Bag of Words exhibits. In this article, we analyze the use of several vector-based word representations besides the classical ones, in a polarity analysis task on movie reviews. Experimental results show the effectiveness of more elaborate representations in comparison to Bag of Words. In particular, Concise Semantic Analysis representation seems to be very robust and effective because independently the classifier used with, the results are really good. Dimension and time of getting the representations are also showed, concluding in the efficiency of the classifiers when Concise Semantic Analysis is considered.
format Objeto de conferencia
Objeto de conferencia
author Cuello, Carolina Y.
Jofre Caradonna, Vanessa
Garciarena Ucelay, María José
Cagnina, Leticia
author_facet Cuello, Carolina Y.
Jofre Caradonna, Vanessa
Garciarena Ucelay, María José
Cagnina, Leticia
author_sort Cuello, Carolina Y.
title On the Importance of Data Representation for the Success of Text Classification
title_short On the Importance of Data Representation for the Success of Text Classification
title_full On the Importance of Data Representation for the Success of Text Classification
title_fullStr On the Importance of Data Representation for the Success of Text Classification
title_full_unstemmed On the Importance of Data Representation for the Success of Text Classification
title_sort on the importance of data representation for the success of text classification
publishDate 2022
url http://sedici.unlp.edu.ar/handle/10915/149536
work_keys_str_mv AT cuellocarolinay ontheimportanceofdatarepresentationforthesuccessoftextclassification
AT jofrecaradonnavanessa ontheimportanceofdatarepresentationforthesuccessoftextclassification
AT garciarenaucelaymariajose ontheimportanceofdatarepresentationforthesuccessoftextclassification
AT cagninaleticia ontheimportanceofdatarepresentationforthesuccessoftextclassification
bdutipo_str Repositorios
_version_ 1764820462952513537