Learning to detect spam messages

The problem of unwanted e-mails (or spam messages) has been increasing for years. Different methods have been proposed in order to deal with this problem wich includes blacklists of known spammers, handcrafted rules and machine learning techniques. In this paper we investigate the performance of th...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Gil Costa, Graciela Verónica, Errecalde, Marcelo Luis, Taranilla, María Teresa
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2005
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/22957
Aporte de:
Descripción
Sumario:The problem of unwanted e-mails (or spam messages) has been increasing for years. Different methods have been proposed in order to deal with this problem wich includes blacklists of known spammers, handcrafted rules and machine learning techniques. In this paper we investigate the performance of the k Nearest Neighbours (k-NN) method in spam detection tasks. At this end, a number of different document codifications were tested. Moreover, we study how the vocabulary size reduction affects this task. In the experimental design, different k values were considered and results were analyzed with respect to a public mailing list and personal e-mail collections. The experiments showed that results with public mailing lists tend to be very optimistic and they should not be considered representative of those expected with personal user accounts.