Comparative Study of the Performance of the Classification Algorithms of the Apache Spark ML Library

Classification algorithms are widely used in several areas: finance, education, security, medicine, and more. Another use of these algorithms is to support feature extraction techniques. These techniques use classification algorithms to determine the best subset of attributes that support an accepta...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Camele, Genaro, Hasperué, Waldo, Ronchetti, Franco, Quiroga, Facundo Manuel
Formato:	Objeto de conferencia
Lenguaje:	Inglés
Publicado:	2021
Materias:	Ciencias Informáticas Big Data Machine learning Classification Models Apache Spark Spark ML
Acceso en línea:	http://sedici.unlp.edu.ar/handle/10915/130348
Aporte de:	SEDICI (UNLP) de Universidad Nacional de La Plata

id	I19-R120-10915-130348
record_format	dspace
institution	Universidad Nacional de La Plata
institution_str	I-19
repository_str	R-120
collection	SEDICI (UNLP)
language	Inglés
topic	Ciencias Informáticas Big Data Machine learning Classification Models Apache Spark Spark ML
spellingShingle	Ciencias Informáticas Big Data Machine learning Classification Models Apache Spark Spark ML Camele, Genaro Hasperué, Waldo Ronchetti, Franco Quiroga, Facundo Manuel Comparative Study of the Performance of the Classification Algorithms of the Apache Spark ML Library
topic_facet	Ciencias Informáticas Big Data Machine learning Classification Models Apache Spark Spark ML
description	Classification algorithms are widely used in several areas: finance, education, security, medicine, and more. Another use of these algorithms is to support feature extraction techniques. These techniques use classification algorithms to determine the best subset of attributes that support an acceptable prediction. Currently, a large amount of data is being collected and, as a result, databases are becoming increasingly larger and distributed processing becomes a necessity. In this sense, Spark, and in particular its Spark ML library, is one of the most widely used frameworks for performing classification tasks in large databases. Given that some feature extraction techniques need to execute a classification algorithm a significant number of times, with a different subset of attributes in each run, the performance of these algorithms should be known beforehand so that the overall feature extraction process is carried out in the shortest possible time. In this work, we carry out a comparative study of four Spark ML classification algorithms, measuring predictive power and execution times as a function of the number of attributes in the training dataset.
format	Objeto de conferencia Objeto de conferencia
author	Camele, Genaro Hasperué, Waldo Ronchetti, Franco Quiroga, Facundo Manuel
author_facet	Camele, Genaro Hasperué, Waldo Ronchetti, Franco Quiroga, Facundo Manuel
author_sort	Camele, Genaro
title	Comparative Study of the Performance of the Classification Algorithms of the Apache Spark ML Library
title_short	Comparative Study of the Performance of the Classification Algorithms of the Apache Spark ML Library
title_full	Comparative Study of the Performance of the Classification Algorithms of the Apache Spark ML Library
title_fullStr	Comparative Study of the Performance of the Classification Algorithms of the Apache Spark ML Library
title_full_unstemmed	Comparative Study of the Performance of the Classification Algorithms of the Apache Spark ML Library
title_sort	comparative study of the performance of the classification algorithms of the apache spark ml library
publishDate	2021
url	http://sedici.unlp.edu.ar/handle/10915/130348
work_keys_str_mv	AT camelegenaro comparativestudyoftheperformanceoftheclassificationalgorithmsoftheapachesparkmllibrary AT hasperuewaldo comparativestudyoftheperformanceoftheclassificationalgorithmsoftheapachesparkmllibrary AT ronchettifranco comparativestudyoftheperformanceoftheclassificationalgorithmsoftheapachesparkmllibrary AT quirogafacundomanuel comparativestudyoftheperformanceoftheclassificationalgorithmsoftheapachesparkmllibrary
bdutipo_str	Repositorios
_version_	1764820453304565761

Comparative Study of the Performance of the Classification Algorithms of the Apache Spark ML Library

Ejemplares similares