Towards Information Quality Assurance in Spanish: Wikipedia

Featured Articles (FA) are considered to be the best articles that Wikipedia has to offer and in the last years, researchers have found interesting to analyze whether and how they can be distinguished from “ordinary” articles. Likewise, identifying what issues have to be enhanced or fixed in ordinar...

Descripción completa

Detalles Bibliográficos
Autores principales: Ferretti, Edgardo, Soria, Matías, Pérez Casseignau, Sebastián, Pohn, Lian, Urquiza, Guido, Gómez, Sergio Alejandro, Errecalde, Marcelo Luis
Formato: Articulo
Lenguaje:Inglés
Publicado: 2017
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/59979
http://journal.info.unlp.edu.ar/wp-content/uploads/2017/05/JCST-44-Paper-4.pdf
Aporte de:
id I19-R120-10915-59979
record_format dspace
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
featured article identification
information quality
quality flaws prediction
Wikipedia
spellingShingle Ciencias Informáticas
featured article identification
information quality
quality flaws prediction
Wikipedia
Ferretti, Edgardo
Soria, Matías
Pérez Casseignau, Sebastián
Pohn, Lian
Urquiza, Guido
Gómez, Sergio Alejandro
Errecalde, Marcelo Luis
Towards Information Quality Assurance in Spanish: Wikipedia
topic_facet Ciencias Informáticas
featured article identification
information quality
quality flaws prediction
Wikipedia
description Featured Articles (FA) are considered to be the best articles that Wikipedia has to offer and in the last years, researchers have found interesting to analyze whether and how they can be distinguished from “ordinary” articles. Likewise, identifying what issues have to be enhanced or fixed in ordinary articles in order to improve their quality is a recent key research trend. Most of the approaches developed to face these information quality problems have been proposed for the English Wikipedia. However, few efforts have been accomplished in Spanish Wikipedia, despite being Spanish, one of the most spoken languages in the world by native speakers. In this respect, we present a breakdown of Spanish Wikipedia’s quality flaw structure. Besides, we carry out studies with three different corpora to automatically assess information quality in Spanish Wikipedia, where FA identification is evaluated as a binary classification task. Our evaluation on a unified setting allows to compare with the English version, the performance achieved by our approach on the Spanish version. The best results obtained show that FA identification in Spanish, can be performed with an F1 score of 0.88 using a document model consisting of only twenty six features and Support Vector Machine as classification algorithm.
format Articulo
Articulo
author Ferretti, Edgardo
Soria, Matías
Pérez Casseignau, Sebastián
Pohn, Lian
Urquiza, Guido
Gómez, Sergio Alejandro
Errecalde, Marcelo Luis
author_facet Ferretti, Edgardo
Soria, Matías
Pérez Casseignau, Sebastián
Pohn, Lian
Urquiza, Guido
Gómez, Sergio Alejandro
Errecalde, Marcelo Luis
author_sort Ferretti, Edgardo
title Towards Information Quality Assurance in Spanish: Wikipedia
title_short Towards Information Quality Assurance in Spanish: Wikipedia
title_full Towards Information Quality Assurance in Spanish: Wikipedia
title_fullStr Towards Information Quality Assurance in Spanish: Wikipedia
title_full_unstemmed Towards Information Quality Assurance in Spanish: Wikipedia
title_sort towards information quality assurance in spanish: wikipedia
publishDate 2017
url http://sedici.unlp.edu.ar/handle/10915/59979
http://journal.info.unlp.edu.ar/wp-content/uploads/2017/05/JCST-44-Paper-4.pdf
work_keys_str_mv AT ferrettiedgardo towardsinformationqualityassuranceinspanishwikipedia
AT soriamatias towardsinformationqualityassuranceinspanishwikipedia
AT perezcasseignausebastian towardsinformationqualityassuranceinspanishwikipedia
AT pohnlian towardsinformationqualityassuranceinspanishwikipedia
AT urquizaguido towardsinformationqualityassuranceinspanishwikipedia
AT gomezsergioalejandro towardsinformationqualityassuranceinspanishwikipedia
AT errecaldemarceloluis towardsinformationqualityassuranceinspanishwikipedia
bdutipo_str Repositorios
_version_ 1764820478186225665