Learning to detect spam messages
The problem of unwanted e-mails (or spam messages) has been increasing for years. Different methods have been proposed in order to deal with this problem wich includes blacklists of known spammers, handcrafted rules and machine learning techniques. In this paper we investigate the performance of th...
Guardado en:
| Autores principales: | , , |
|---|---|
| Formato: | Objeto de conferencia |
| Lenguaje: | Inglés |
| Publicado: |
2005
|
| Materias: | |
| Acceso en línea: | http://sedici.unlp.edu.ar/handle/10915/22957 |
| Aporte de: |
| id |
I19-R120-10915-22957 |
|---|---|
| record_format |
dspace |
| institution |
Universidad Nacional de La Plata |
| institution_str |
I-19 |
| repository_str |
R-120 |
| collection |
SEDICI (UNLP) |
| language |
Inglés |
| topic |
Ciencias Informáticas Electronic mail Message sending Information filtering spam anti-spam filtering automated text categorization machine learning k-NN |
| spellingShingle |
Ciencias Informáticas Electronic mail Message sending Information filtering spam anti-spam filtering automated text categorization machine learning k-NN Gil Costa, Graciela Verónica Errecalde, Marcelo Luis Taranilla, María Teresa Learning to detect spam messages |
| topic_facet |
Ciencias Informáticas Electronic mail Message sending Information filtering spam anti-spam filtering automated text categorization machine learning k-NN |
| description |
The problem of unwanted e-mails (or spam messages) has been increasing for years. Different methods have been proposed in order to deal with this problem wich includes blacklists of known spammers, handcrafted rules and machine learning techniques.
In this paper we investigate the performance of the k Nearest Neighbours (k-NN) method in spam detection tasks. At this end, a number of different document codifications were tested.
Moreover, we study how the vocabulary size reduction affects this task. In the experimental design, different k values were considered and results were analyzed with respect to a public mailing list and personal e-mail collections. The experiments showed that results with public mailing lists tend to be very optimistic and they should not be considered representative of those expected with personal user accounts. |
| format |
Objeto de conferencia Objeto de conferencia |
| author |
Gil Costa, Graciela Verónica Errecalde, Marcelo Luis Taranilla, María Teresa |
| author_facet |
Gil Costa, Graciela Verónica Errecalde, Marcelo Luis Taranilla, María Teresa |
| author_sort |
Gil Costa, Graciela Verónica |
| title |
Learning to detect spam messages |
| title_short |
Learning to detect spam messages |
| title_full |
Learning to detect spam messages |
| title_fullStr |
Learning to detect spam messages |
| title_full_unstemmed |
Learning to detect spam messages |
| title_sort |
learning to detect spam messages |
| publishDate |
2005 |
| url |
http://sedici.unlp.edu.ar/handle/10915/22957 |
| work_keys_str_mv |
AT gilcostagracielaveronica learningtodetectspammessages AT errecaldemarceloluis learningtodetectspammessages AT taranillamariateresa learningtodetectspammessages |
| bdutipo_str |
Repositorios |
| _version_ |
1764820467938492416 |