Improving the Generation of Labeled Network Traffic Datasets Through Machine Learning Techniques

The problem of detecting malicious behavior in network traffic has become an extremely difficult challenge for the security community. Consequently, several intelligence-based tools have been proposed to generate models capable of understanding the information traveling through the network and to h...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Guerra, Jorge, Catania, Carlos
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2017
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/63933
Aporte de:
id I19-R120-10915-63933
record_format dspace
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
machine learning
dataset generation
network security
spellingShingle Ciencias Informáticas
machine learning
dataset generation
network security
Guerra, Jorge
Catania, Carlos
Improving the Generation of Labeled Network Traffic Datasets Through Machine Learning Techniques
topic_facet Ciencias Informáticas
machine learning
dataset generation
network security
description The problem of detecting malicious behavior in network traffic has become an extremely difficult challenge for the security community. Consequently, several intelligence-based tools have been proposed to generate models capable of understanding the information traveling through the network and to help in the identification of suspicious connections as soon as possible. However, the lack of high-quality datasets has been one of the main obstacles in the developing of reliable intelligence-based tools. A well-labeled dataset is fundamental not only for the process of automatically learning models but also for testing its performance. Recently, RiskID emerged with the goal of providing to the network security community a collaborative tool for helping the labeling process. Through the use of visual and statistical techniques, RiskID facilitates to the user the generation of labeled datasets from real connections. In this article, we present a machine learning extension for RiskID, to help the user in the malware identification process. A preliminary study shows that as the size of labeled data increases, the use of machine learning models can be a valuable tool during the labeling process of future traffic connections.
format Objeto de conferencia
Objeto de conferencia
author Guerra, Jorge
Catania, Carlos
author_facet Guerra, Jorge
Catania, Carlos
author_sort Guerra, Jorge
title Improving the Generation of Labeled Network Traffic Datasets Through Machine Learning Techniques
title_short Improving the Generation of Labeled Network Traffic Datasets Through Machine Learning Techniques
title_full Improving the Generation of Labeled Network Traffic Datasets Through Machine Learning Techniques
title_fullStr Improving the Generation of Labeled Network Traffic Datasets Through Machine Learning Techniques
title_full_unstemmed Improving the Generation of Labeled Network Traffic Datasets Through Machine Learning Techniques
title_sort improving the generation of labeled network traffic datasets through machine learning techniques
publishDate 2017
url http://sedici.unlp.edu.ar/handle/10915/63933
work_keys_str_mv AT guerrajorge improvingthegenerationoflabelednetworktrafficdatasetsthroughmachinelearningtechniques
AT cataniacarlos improvingthegenerationoflabelednetworktrafficdatasetsthroughmachinelearningtechniques
bdutipo_str Repositorios
_version_ 1764820479470731265