Resolving FP-TP Conflict in Digest-Based Collaborative Spam Detection by Use of Negative Selection Algorithm

Sarafijanovic, Slavisa; Perez, Sabrina; Le Boudec, Jean-Yves

Sarafijanovic, Slavisa; Perez, Sabrina; Le Boudec, Jean-Yves

2008

Formats

Format
BibTeX
MARC
MARCXML
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

A well-known approach for collaborative spam filtering is to determine which emails belong to the same bulk, e.g. by exploiting their content similarity. This allows, after observing an initial portion of a bulk, for the bulkiness scores to be assigned to the remaining emails from the same bulk. This also allows the individual evidence of spamminess to be joined, if such evidence is generated by collaborating filters or users for some of the emails from an initial portion of the bulk. Usually a database of previously observed emails or email digests is formed and queried upon receiving new emails. Previous evaluations [2,10] of the approach based on the email digests that preserve email content similarity indicate and partially demonstrate that there are ways to make the approach robust to increased obfuscation efforts by spammers. However, for the settings of the parameters that provide good matching between the emails from the same bulk, the unwanted random matching between ham emails and unrelated ham and spam emails stays rather high. This directly translates into a need for use of higher bulkiness thresholds in order to ensure low false positive (FP) detection of ham, which implies that larger initial parts of spam bulks will not be filtered, i.e. true positive (TP) detection will not be very high (FP-TP conflict). In this paper we demonstrate how, by use of the negative selection algorithm, the unwanted random matching between unrelated emails may be decreased at least by an order of magnitude, while preserving the same good matching between the emails from the same bulk. We also show how this translates into an order of magnitude (at least) of less undetected bulky spam emails, under the same ham miss- detection requirements.

Details

Title Resolving FP-TP Conflict in Digest-Based Collaborative Spam Detection by Use of Negative Selection Algorithm

Author(s) Sarafijanovic, Slavisa ; Perez, Sabrina ; Le Boudec, Jean-Yves

Published in Proceedings of CEAS 2008, The Fifth Conference on Email and Antispam

Conference CEAS 2008, The Fifth Conference on Email and Antispam, Mountain View, California, USA, August 21-22, 2008

Date 2008

Keywords

Email; spam; open digest; similarity hashing; data representation; collaborative; detection; filtering; obfuscation; robustness; negative selection algorithm

Additional link URL

Laboratories LCA
LCA2

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IC Archives > LCA - Laboratory for Computer Communications and Applications
Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > LCA2 - Computer Communications and Applications Laboratory 2
Peer-reviewed publications
Conference Papers
Work produced at EPFL
Published

Record creation date 2008-09-18

Actions

Preview

Select file: