Cluster Ensembles for Big Data Mining Problems

Mining big data involves several problems and new challenges, in addition to the huge volume of information. One the one hand, these data generally come from autonomous and decentralized sources, thus its dimensionality is heterogeneous and diverse, and generally involves privacy issues. On the othe...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Pividori, Milton, Stegmayer, Georgina, Milone, Diego H.
Formato:	Objeto de conferencia
Lenguaje:	Inglés
Publicado:	2015
Materias:	Ciencias Informáticas Data mining big data Clustering
Acceso en línea:	http://sedici.unlp.edu.ar/handle/10915/51984 http://44jaiio.sadio.org.ar/sites/default/files/agranda52-54.pdf
Aporte de:	SEDICI (UNLP) de Universidad Nacional de La Plata

id	I19-R120-10915-51984
record_format	dspace
institution	Universidad Nacional de La Plata
institution_str	I-19
repository_str	R-120
collection	SEDICI (UNLP)
language	Inglés
topic	Ciencias Informáticas Data mining big data Clustering
spellingShingle	Ciencias Informáticas Data mining big data Clustering Pividori, Milton Stegmayer, Georgina Milone, Diego H. Cluster Ensembles for Big Data Mining Problems
topic_facet	Ciencias Informáticas Data mining big data Clustering
description	Mining big data involves several problems and new challenges, in addition to the huge volume of information. One the one hand, these data generally come from autonomous and decentralized sources, thus its dimensionality is heterogeneous and diverse, and generally involves privacy issues. On the other hand, algorithms for mining data such as clustering methods, have particular characteristics that make them useful for different types of data mining problems. Due to the huge amount of information, the task of choosing a single clustering approach becomes even more difficult. For instance, k-means, a very popular algorithm, always assumes spherical clusters in data; hierarchical approaches can be used when there is interest in finding this type of structure; expectationmaximization iteratively adjusts the parameters of a statistical model to fit the observed data. Moreover, all these methods work properly only with relatively small data sets. Large-volume data often make their application unfeasible, not to mention if data come from autonomous sources that are constantly growing and evolving. In the last years, a new clustering approach has emerged, called consensus clustering or cluster ensembles. Instead of running a single algorithm, this approach produces, at first, a set of data partitions (ensemble) by employing different clustering techniques on the same original data set. Then, this ensemble is processed by a consensus function, which produces a single consensus partition that outperforms individual solutions in the input ensemble. This approach has been successfully employed for distributed data mining, what makes it very interesting and applicable in the big data context. Although many techniques have been proposed for large data sets, most of them mainly focus on making individual components more efficient, instead of improving the whole consensus approach for the case of big data.
format	Objeto de conferencia Objeto de conferencia
author	Pividori, Milton Stegmayer, Georgina Milone, Diego H.
author_facet	Pividori, Milton Stegmayer, Georgina Milone, Diego H.
author_sort	Pividori, Milton
title	Cluster Ensembles for Big Data Mining Problems
title_short	Cluster Ensembles for Big Data Mining Problems
title_full	Cluster Ensembles for Big Data Mining Problems
title_fullStr	Cluster Ensembles for Big Data Mining Problems
title_full_unstemmed	Cluster Ensembles for Big Data Mining Problems
title_sort	cluster ensembles for big data mining problems
publishDate	2015
url	http://sedici.unlp.edu.ar/handle/10915/51984 http://44jaiio.sadio.org.ar/sites/default/files/agranda52-54.pdf
work_keys_str_mv	AT pividorimilton clusterensemblesforbigdataminingproblems AT stegmayergeorgina clusterensemblesforbigdataminingproblems AT milonediegoh clusterensemblesforbigdataminingproblems
bdutipo_str	Repositorios
_version_	1764820476347023362

Cluster Ensembles for Big Data Mining Problems

Ejemplares similares