Implementations of data mining and natural language processing on news portals publications: an exploratory approach to hypothesis generation in semiotic interpretation

This article presents preliminary results from an empirical, exploratory approach to a set of news articles published on six Argentine general-interest news portals, together with the comments posted there by reader–users through Facebook. The material presented here forms part of a broader research...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Alomar, Francisco, Raimondo Anselmino, Natalia, Sollberger, Dolores, Gindín, Irene Lis
Formato: Artículo revista
Lenguaje:Español
Publicado: Universidad Nacional de Rosario 2026
Materias:
Acceso en línea:https://aprendoconnooj.unr.edu.ar/index.php/revista/article/view/45
Aporte de:
Descripción
Sumario:This article presents preliminary results from an empirical, exploratory approach to a set of news articles published on six Argentine general-interest news portals, together with the comments posted there by reader–users through Facebook. The material presented here forms part of a broader research project on the platformization of discourses about the public/ common, focused on a case of “narcoterrorism” in the city of Rosario (Argentina) during 2024, conducted by a multidisciplinary research team. The findings reported here derive from the application of data mining and natural language processing (NLP) algorithms, including the vectorization of texts through embeddings, dimensionality reduction, and the identification of algorithmic groupings projected into graphical representations. These procedures—implemented within a methodological strategy that combines approaches under the framework referred to as semiodata—are understood to operate as triggers for abductive inferences that generate working hypotheses. The analytical path proposed here makes it possible to outline an alternative entry point for the sociosemiotic analysis of mediatized discourses and to apply a specific level of observation which, taken together, proves fruitful for distinguishing boundaries at the level of discursive genres through the visualization of geometric distances. In this sense, algorithmic implementations may be potentially useful for identifying certain invariant disparities linked to enunciative properties of a generic order—at least with regard to the distinction between primary and secondary genres—whereas other differences (such as those related to the variety of journalistic genres) do not appear to have been captured computationally.