Clustering Tasks and Decision Trees with Augustan Love Poets: Cohesion and Separation in Feature Importance Extraction

This article extends various automatic text analysis tasks from previous works by applying natural language processing techniques to a corpus of Latin texts from the 1st century BC and 1st century AD. The motivation behind this work is to delve into and understand a historical literary trend revolv...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Nusch, Carlos Javier, Del Rio Riande, María Gimena, Cagnina, Leticia, Errecalde, Marcelo Luis, Antonelli, Rubén Leandro
Formato:	Articulo
Lenguaje:	Español
Publicado:	2024
Materias:	Informática Humanidades Augustan love poets Document Clustering K Means Silhouette Coefficient Decision Trees Feature Importance Information Gain Ratio
Acceso en línea:	http://sedici.unlp.edu.ar/handle/10915/175050
Aporte de:	SEDICI (UNLP) de Universidad Nacional de La Plata

id	I19-R120-10915-175050
record_format	dspace
spelling	I19-R120-10915-1750502025-02-25T17:07:29Z http://sedici.unlp.edu.ar/handle/10915/175050 Clustering Tasks and Decision Trees with Augustan Love Poets: Cohesion and Separation in Feature Importance Extraction Nusch, Carlos Javier Del Rio Riande, María Gimena Cagnina, Leticia Errecalde, Marcelo Luis Antonelli, Rubén Leandro 2024-11-18 2024-12-18T11:54:04Z es Informática Humanidades Augustan love poets Document Clustering K Means Silhouette Coefficient Decision Trees Feature Importance Information Gain Ratio This article extends various automatic text analysis tasks from previous works by applying natural language processing techniques to a corpus of Latin texts from the 1st century BC and 1st century AD. The motivation behind this work is to delve into and understand a historical literary trend revolving around the themes of love, spanning from antiquity through to the medieval period. The analyzed authors include Gaius Valerius Catullus, Albius Tibullus, and Sextus Propertius, representing the literary movement of the neoterics, and Publius Vergilius Maro and Marcus Annaeus Lucanus, epic poets with distinct styles, serving as control samples. Unlike previous works, various corrections were added to the preprocessing tasks, including improved word tokenization with enclitics and handling of orthographic variances. For the clustering tasks, the K-Means method and the Silhouette Score were used to determine the optimal cluster sizes. Using these optimal clusters as labels, decision trees were trained for each range of n-grams, aiming to identify features with the highest Information Gain and Information Gain Ratio. The trees were trained based on the criterion of Entropy, and calculations of Feature Importance were performed. In this study, we focused on detailing the classification results and features extracted by the decision trees, based on the best Silhouette scores obtained and the Information Gain. We examined whether the words or parts of words with classificatory potential identified in the process matched the findings from previous exploratory tasks performed using other techniques. Dirección PREBI-SEDICI Articulo Articulo http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) application/pdf
institution	Universidad Nacional de La Plata
institution_str	I-19
repository_str	R-120
collection	SEDICI (UNLP)
language	Español
topic	Informática Humanidades Augustan love poets Document Clustering K Means Silhouette Coefficient Decision Trees Feature Importance Information Gain Ratio
spellingShingle	Informática Humanidades Augustan love poets Document Clustering K Means Silhouette Coefficient Decision Trees Feature Importance Information Gain Ratio Nusch, Carlos Javier Del Rio Riande, María Gimena Cagnina, Leticia Errecalde, Marcelo Luis Antonelli, Rubén Leandro Clustering Tasks and Decision Trees with Augustan Love Poets: Cohesion and Separation in Feature Importance Extraction
topic_facet	Informática Humanidades Augustan love poets Document Clustering K Means Silhouette Coefficient Decision Trees Feature Importance Information Gain Ratio
description	This article extends various automatic text analysis tasks from previous works by applying natural language processing techniques to a corpus of Latin texts from the 1st century BC and 1st century AD. The motivation behind this work is to delve into and understand a historical literary trend revolving around the themes of love, spanning from antiquity through to the medieval period. The analyzed authors include Gaius Valerius Catullus, Albius Tibullus, and Sextus Propertius, representing the literary movement of the neoterics, and Publius Vergilius Maro and Marcus Annaeus Lucanus, epic poets with distinct styles, serving as control samples. Unlike previous works, various corrections were added to the preprocessing tasks, including improved word tokenization with enclitics and handling of orthographic variances. For the clustering tasks, the K-Means method and the Silhouette Score were used to determine the optimal cluster sizes. Using these optimal clusters as labels, decision trees were trained for each range of n-grams, aiming to identify features with the highest Information Gain and Information Gain Ratio. The trees were trained based on the criterion of Entropy, and calculations of Feature Importance were performed. In this study, we focused on detailing the classification results and features extracted by the decision trees, based on the best Silhouette scores obtained and the Information Gain. We examined whether the words or parts of words with classificatory potential identified in the process matched the findings from previous exploratory tasks performed using other techniques.
format	Articulo Articulo
author	Nusch, Carlos Javier Del Rio Riande, María Gimena Cagnina, Leticia Errecalde, Marcelo Luis Antonelli, Rubén Leandro
author_facet	Nusch, Carlos Javier Del Rio Riande, María Gimena Cagnina, Leticia Errecalde, Marcelo Luis Antonelli, Rubén Leandro
author_sort	Nusch, Carlos Javier
title	Clustering Tasks and Decision Trees with Augustan Love Poets: Cohesion and Separation in Feature Importance Extraction
title_short	Clustering Tasks and Decision Trees with Augustan Love Poets: Cohesion and Separation in Feature Importance Extraction
title_full	Clustering Tasks and Decision Trees with Augustan Love Poets: Cohesion and Separation in Feature Importance Extraction
title_fullStr	Clustering Tasks and Decision Trees with Augustan Love Poets: Cohesion and Separation in Feature Importance Extraction
title_full_unstemmed	Clustering Tasks and Decision Trees with Augustan Love Poets: Cohesion and Separation in Feature Importance Extraction
title_sort	clustering tasks and decision trees with augustan love poets: cohesion and separation in feature importance extraction
publishDate	2024
url	http://sedici.unlp.edu.ar/handle/10915/175050
work_keys_str_mv	AT nuschcarlosjavier clusteringtasksanddecisiontreeswithaugustanlovepoetscohesionandseparationinfeatureimportanceextraction AT delrioriandemariagimena clusteringtasksanddecisiontreeswithaugustanlovepoetscohesionandseparationinfeatureimportanceextraction AT cagninaleticia clusteringtasksanddecisiontreeswithaugustanlovepoetscohesionandseparationinfeatureimportanceextraction AT errecaldemarceloluis clusteringtasksanddecisiontreeswithaugustanlovepoetscohesionandseparationinfeatureimportanceextraction AT antonellirubenleandro clusteringtasksanddecisiontreeswithaugustanlovepoetscohesionandseparationinfeatureimportanceextraction
_version_	1842398891521081344

Clustering Tasks and Decision Trees with Augustan Love Poets: Cohesion and Separation in Feature Importance Extraction

Ejemplares similares