An analysis of local and global solutions to address Big Data imbalanced classification: a case study with SMOTE preprocessing

Addressing the huge amount of data continuously generated is an important challenge in the Machine Learning field. The need to adapt the traditional techniques or create new ones is evident. To do so, distributed technologies have to be used to deal with the significant scalability constraints due t...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Basgall, María José, Hasperué, Waldo, Naiouf, Marcelo, Fernández, Alberto, Herrera, Francisco
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2019
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/80384
Aporte de:
id I19-R120-10915-80384
record_format dspace
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
big data
imbalanced classification
preprocessing techniques
SMOTE
scalability
spellingShingle Ciencias Informáticas
big data
imbalanced classification
preprocessing techniques
SMOTE
scalability
Basgall, María José
Hasperué, Waldo
Naiouf, Marcelo
Fernández, Alberto
Herrera, Francisco
An analysis of local and global solutions to address Big Data imbalanced classification: a case study with SMOTE preprocessing
topic_facet Ciencias Informáticas
big data
imbalanced classification
preprocessing techniques
SMOTE
scalability
description Addressing the huge amount of data continuously generated is an important challenge in the Machine Learning field. The need to adapt the traditional techniques or create new ones is evident. To do so, distributed technologies have to be used to deal with the significant scalability constraints due to the Big Data context. In many Big Data applications for classification, there are some classes that are highly underrepresented, leading to what is known as the imbalanced classification problem. In this scenario, learning algorithms are often biased towards the majority classes, treating minority ones as outliers or noise. Consequently, preprocessing techniques to balance the class distribution were developed. This can be achieved by suppressing majority instances (undersampling) or by creating minority examples (oversampling). Regarding the oversampling methods, one of the most widespread is the SMOTE algorithm, which creates artificial examples according to the neighborhood of each minority class instance. In this work, our objective is to analyze the SMOTE behavior in Big Data as a function of some key aspects such as the oversampling degree, the neighborhood value and, specially, the type of distributed design (local vs. global).
format Objeto de conferencia
Objeto de conferencia
author Basgall, María José
Hasperué, Waldo
Naiouf, Marcelo
Fernández, Alberto
Herrera, Francisco
author_facet Basgall, María José
Hasperué, Waldo
Naiouf, Marcelo
Fernández, Alberto
Herrera, Francisco
author_sort Basgall, María José
title An analysis of local and global solutions to address Big Data imbalanced classification: a case study with SMOTE preprocessing
title_short An analysis of local and global solutions to address Big Data imbalanced classification: a case study with SMOTE preprocessing
title_full An analysis of local and global solutions to address Big Data imbalanced classification: a case study with SMOTE preprocessing
title_fullStr An analysis of local and global solutions to address Big Data imbalanced classification: a case study with SMOTE preprocessing
title_full_unstemmed An analysis of local and global solutions to address Big Data imbalanced classification: a case study with SMOTE preprocessing
title_sort analysis of local and global solutions to address big data imbalanced classification: a case study with smote preprocessing
publishDate 2019
url http://sedici.unlp.edu.ar/handle/10915/80384
work_keys_str_mv AT basgallmariajose ananalysisoflocalandglobalsolutionstoaddressbigdataimbalancedclassificationacasestudywithsmotepreprocessing
AT hasperuewaldo ananalysisoflocalandglobalsolutionstoaddressbigdataimbalancedclassificationacasestudywithsmotepreprocessing
AT naioufmarcelo ananalysisoflocalandglobalsolutionstoaddressbigdataimbalancedclassificationacasestudywithsmotepreprocessing
AT fernandezalberto ananalysisoflocalandglobalsolutionstoaddressbigdataimbalancedclassificationacasestudywithsmotepreprocessing
AT herrerafrancisco ananalysisoflocalandglobalsolutionstoaddressbigdataimbalancedclassificationacasestudywithsmotepreprocessing
AT basgallmariajose analysisoflocalandglobalsolutionstoaddressbigdataimbalancedclassificationacasestudywithsmotepreprocessing
AT hasperuewaldo analysisoflocalandglobalsolutionstoaddressbigdataimbalancedclassificationacasestudywithsmotepreprocessing
AT naioufmarcelo analysisoflocalandglobalsolutionstoaddressbigdataimbalancedclassificationacasestudywithsmotepreprocessing
AT fernandezalberto analysisoflocalandglobalsolutionstoaddressbigdataimbalancedclassificationacasestudywithsmotepreprocessing
AT herrerafrancisco analysisoflocalandglobalsolutionstoaddressbigdataimbalancedclassificationacasestudywithsmotepreprocessing
bdutipo_str Repositorios
_version_ 1764820486912475137