Evaluation of Named Entity Recognition in Historical Argentinian Documents

Research over historical text volumes can be performed by means of automatic tools that help historians achieve more abstract and aggregated points of view. Tasks such as Information Extraction or Text Mining can be performed more efficiently if Machine Learning models are employed. We propose the e...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Darfe, Facundo, Xamena, Eduardo, Orozco, Carlos I.
Formato:	Objeto de conferencia
Lenguaje:	Inglés
Publicado:	2022
Materias:	Ciencias Informáticas Named Entity Recognition and Classification Argentinian History Pretrained Language Models
Acceso en línea:	http://sedici.unlp.edu.ar/handle/10915/151702 https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/270/221
Aporte de:	SEDICI (UNLP) de Universidad Nacional de La Plata

id	I19-R120-10915-151702
record_format	dspace
spelling	I19-R120-10915-1517022023-05-03T20:02:12Z http://sedici.unlp.edu.ar/handle/10915/151702 https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/270/221 issn:2451-7496 Evaluation of Named Entity Recognition in Historical Argentinian Documents Darfe, Facundo Xamena, Eduardo Orozco, Carlos I. 2022-10 2022 2023-04-18T18:54:20Z en Ciencias Informáticas Named Entity Recognition and Classification Argentinian History Pretrained Language Models Research over historical text volumes can be performed by means of automatic tools that help historians achieve more abstract and aggregated points of view. Tasks such as Information Extraction or Text Mining can be performed more efficiently if Machine Learning models are employed. We propose the evaluation of different state-of-the-art models over a new dataset for Named Entity Recognition. The dataset was built over a History texts volume about General Güemes, a national Argentinian independence hero. The results show that some models perform better in terms of precision, recall and f1-score for most types of entities. Specifically, pretrained language models fine-tuned for this particular task show considerably higher performance than classical models based on word embeddings and other kinds of representations and models.Besides, statistical tests are provided to ensure the significance in the differences of the performance values attained. Hence, the contribution of this work is twofold, on the one hand a new corpus and dataset for Named Entity Recognition and a complete statistical assessment of performance values of state-of-the-art models over the generated dataset. Sociedad Argentina de Informática e Investigación Operativa Objeto de conferencia Objeto de conferencia http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) application/pdf 98-109
institution	Universidad Nacional de La Plata
institution_str	I-19
repository_str	R-120
collection	SEDICI (UNLP)
language	Inglés
topic	Ciencias Informáticas Named Entity Recognition and Classification Argentinian History Pretrained Language Models
spellingShingle	Ciencias Informáticas Named Entity Recognition and Classification Argentinian History Pretrained Language Models Darfe, Facundo Xamena, Eduardo Orozco, Carlos I. Evaluation of Named Entity Recognition in Historical Argentinian Documents
topic_facet	Ciencias Informáticas Named Entity Recognition and Classification Argentinian History Pretrained Language Models
description	Research over historical text volumes can be performed by means of automatic tools that help historians achieve more abstract and aggregated points of view. Tasks such as Information Extraction or Text Mining can be performed more efficiently if Machine Learning models are employed. We propose the evaluation of different state-of-the-art models over a new dataset for Named Entity Recognition. The dataset was built over a History texts volume about General Güemes, a national Argentinian independence hero. The results show that some models perform better in terms of precision, recall and f1-score for most types of entities. Specifically, pretrained language models fine-tuned for this particular task show considerably higher performance than classical models based on word embeddings and other kinds of representations and models.Besides, statistical tests are provided to ensure the significance in the differences of the performance values attained. Hence, the contribution of this work is twofold, on the one hand a new corpus and dataset for Named Entity Recognition and a complete statistical assessment of performance values of state-of-the-art models over the generated dataset.
format	Objeto de conferencia Objeto de conferencia
author	Darfe, Facundo Xamena, Eduardo Orozco, Carlos I.
author_facet	Darfe, Facundo Xamena, Eduardo Orozco, Carlos I.
author_sort	Darfe, Facundo
title	Evaluation of Named Entity Recognition in Historical Argentinian Documents
title_short	Evaluation of Named Entity Recognition in Historical Argentinian Documents
title_full	Evaluation of Named Entity Recognition in Historical Argentinian Documents
title_fullStr	Evaluation of Named Entity Recognition in Historical Argentinian Documents
title_full_unstemmed	Evaluation of Named Entity Recognition in Historical Argentinian Documents
title_sort	evaluation of named entity recognition in historical argentinian documents
publishDate	2022
url	http://sedici.unlp.edu.ar/handle/10915/151702 https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/270/221
work_keys_str_mv	AT darfefacundo evaluationofnamedentityrecognitioninhistoricalargentiniandocuments AT xamenaeduardo evaluationofnamedentityrecognitioninhistoricalargentiniandocuments AT orozcocarlosi evaluationofnamedentityrecognitioninhistoricalargentiniandocuments
_version_	1765660000557989888

Evaluation of Named Entity Recognition in Historical Argentinian Documents

Ejemplares similares