Learning to detect spam messages

The problem of unwanted e-mails (or spam messages) has been increasing for years. Different methods have been proposed in order to deal with this problem wich includes blacklists of known spammers, handcrafted rules and machine learning techniques. In this paper we investigate the performance of th...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Gil Costa, Graciela Verónica, Errecalde, Marcelo Luis, Taranilla, María Teresa
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2005
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/22957
Aporte de:
id I19-R120-10915-22957
record_format dspace
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
Electronic mail
Message sending
Information filtering
spam
anti-spam filtering
automated text categorization
machine learning
k-NN
spellingShingle Ciencias Informáticas
Electronic mail
Message sending
Information filtering
spam
anti-spam filtering
automated text categorization
machine learning
k-NN
Gil Costa, Graciela Verónica
Errecalde, Marcelo Luis
Taranilla, María Teresa
Learning to detect spam messages
topic_facet Ciencias Informáticas
Electronic mail
Message sending
Information filtering
spam
anti-spam filtering
automated text categorization
machine learning
k-NN
description The problem of unwanted e-mails (or spam messages) has been increasing for years. Different methods have been proposed in order to deal with this problem wich includes blacklists of known spammers, handcrafted rules and machine learning techniques. In this paper we investigate the performance of the k Nearest Neighbours (k-NN) method in spam detection tasks. At this end, a number of different document codifications were tested. Moreover, we study how the vocabulary size reduction affects this task. In the experimental design, different k values were considered and results were analyzed with respect to a public mailing list and personal e-mail collections. The experiments showed that results with public mailing lists tend to be very optimistic and they should not be considered representative of those expected with personal user accounts.
format Objeto de conferencia
Objeto de conferencia
author Gil Costa, Graciela Verónica
Errecalde, Marcelo Luis
Taranilla, María Teresa
author_facet Gil Costa, Graciela Verónica
Errecalde, Marcelo Luis
Taranilla, María Teresa
author_sort Gil Costa, Graciela Verónica
title Learning to detect spam messages
title_short Learning to detect spam messages
title_full Learning to detect spam messages
title_fullStr Learning to detect spam messages
title_full_unstemmed Learning to detect spam messages
title_sort learning to detect spam messages
publishDate 2005
url http://sedici.unlp.edu.ar/handle/10915/22957
work_keys_str_mv AT gilcostagracielaveronica learningtodetectspammessages
AT errecaldemarceloluis learningtodetectspammessages
AT taranillamariateresa learningtodetectspammessages
bdutipo_str Repositorios
_version_ 1764820467938492416