Extraction of geographic entities from biological textual sources

This work is focused on the exploration and application of entities extraction techniques for the codification and identification of geographical locations present in the geographic distribution section within botanic documents, such as the plant species manual of Costa Rica. Several technologies mu...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Acuña-Chaves, Moises A., Araya-Monge, José E.
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2017
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/63263
http://www.clei2017-46jaiio.sadio.org.ar/sites/default/files/Mem/SLMDI/SLMDI-10.pdf
Aporte de:
id I19-R120-10915-63263
record_format dspace
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
técnicas de extracción
Procesamiento de Lenguaje Natural
spellingShingle Ciencias Informáticas
técnicas de extracción
Procesamiento de Lenguaje Natural
Acuña-Chaves, Moises A.
Araya-Monge, José E.
Extraction of geographic entities from biological textual sources
topic_facet Ciencias Informáticas
técnicas de extracción
Procesamiento de Lenguaje Natural
description This work is focused on the exploration and application of entities extraction techniques for the codification and identification of geographical locations present in the geographic distribution section within botanic documents, such as the plant species manual of Costa Rica. Several technologies must be combined to achieve such objective, among them is Natural Language Processing (NLP) that helps in the extraction of entities with the usage of gazetteers. Another technology is the usage of rules (regular expressions, Deterministic Automata, context-free grammars). Additional to the identification and codification, an algorithm to bind the place names extracted to authorized sources such as gazetteer is presented. This algorithm identifies and enriches the entry text with extra information, extracted from the paragraphs where the distribution is defined in a semi unstructured text. The values of interest for this work are: world and Costa Rica distribution. After those values are identified, the information can be processed and become useful for diverse applications, such as geographic information systems. Other research projects might be interested in the results of this project. The evaluation consists in manually judging randomly selected sample of the results to establish if the algorithm yields useful data. The judgment features the evaluation of the world and Costa Rica distribution using the source context, given 3 possible values: GOOD, BAD, UNKNOWN. The ideal is to have the least BAD percentage. The algorithm is relatively good to geo-code and bind the world distribution. More work needs to be done for the Costa Rica distribution.
format Objeto de conferencia
Objeto de conferencia
author Acuña-Chaves, Moises A.
Araya-Monge, José E.
author_facet Acuña-Chaves, Moises A.
Araya-Monge, José E.
author_sort Acuña-Chaves, Moises A.
title Extraction of geographic entities from biological textual sources
title_short Extraction of geographic entities from biological textual sources
title_full Extraction of geographic entities from biological textual sources
title_fullStr Extraction of geographic entities from biological textual sources
title_full_unstemmed Extraction of geographic entities from biological textual sources
title_sort extraction of geographic entities from biological textual sources
publishDate 2017
url http://sedici.unlp.edu.ar/handle/10915/63263
http://www.clei2017-46jaiio.sadio.org.ar/sites/default/files/Mem/SLMDI/SLMDI-10.pdf
work_keys_str_mv AT acunachavesmoisesa extractionofgeographicentitiesfrombiologicaltextualsources
AT arayamongejosee extractionofgeographicentitiesfrombiologicaltextualsources
bdutipo_str Repositorios
_version_ 1764820480618921985