000126458 001__ 126458
000126458 005__ 20190316234332.0
000126458 037__ $$aCONF
000126458 245__ $$aResolving FP-TP Conflict in Digest-Based Collaborative Spam Detection by Use of Negative Selection Algorithm
000126458 260__ $$c2008
000126458 269__ $$a2008
000126458 336__ $$aConference Papers
000126458 520__ $$aA well-known approach for collaborative spam filtering is to determine which emails belong to the same bulk, e.g. by exploiting their content similarity. This allows, after observing an initial portion of a bulk, for the bulkiness scores to be assigned to the remaining emails from the same bulk. This also allows the individual evidence of spamminess to be joined, if such evidence is generated by collaborating filters or users for some of the emails from an initial portion of the bulk. Usually a database of previously observed emails or email digests is formed and queried upon receiving new emails. Previous evaluations [2,10] of the approach based on the email digests that preserve email content similarity indicate and partially demonstrate that there are ways to make the approach robust to increased obfuscation efforts by spammers. However, for the settings of the parameters that provide good matching between the emails from the same bulk, the unwanted random matching between ham emails and unrelated ham and spam emails stays rather high. This directly translates into a need for use of higher bulkiness thresholds in order to ensure low false positive (FP) detection of ham, which implies that larger initial parts of spam bulks will not be filtered, i.e. true positive (TP) detection will not be very high (FP-TP conflict). In this paper we demonstrate how, by use of the negative selection algorithm, the unwanted random matching between unrelated emails may be decreased at least by an order of magnitude, while preserving the same good matching between the emails from the same bulk. We also show how this translates into an order of magnitude (at least) of less undetected bulky spam emails, under the same ham miss- detection requirements.
000126458 6531_ $$aEmail
000126458 6531_ $$aspam
000126458 6531_ $$aopen digest
000126458 6531_ $$asimilarity hashing
000126458 6531_ $$adata representation
000126458 6531_ $$acollaborative
000126458 6531_ $$adetection
000126458 6531_ $$afiltering
000126458 6531_ $$aobfuscation
000126458 6531_ $$arobustness
000126458 6531_ $$anegative selection algorithm
000126458 700__ $$aSarafijanovic, Slavisa
000126458 700__ $$aPerez, Sabrina
000126458 700__ $$0241098$$g105633$$aLe Boudec, Jean-Yves
000126458 7112_ $$dAugust 21-22, 2008$$cMountain View, California, USA$$aCEAS 2008, The Fifth Conference on Email and Antispam
000126458 773__ $$tProceedings of CEAS 2008, The Fifth Conference on Email and Antispam
000126458 8564_ $$uhttp://www.ceas.cc/2008/index.html$$zURL
000126458 8564_ $$uhttps://infoscience.epfl.ch/record/126458/files/SarafijanovicEtAl-NegativeSelectionOfEmailDigests-CEAS2008.pdf$$zn/a$$s261709
000126458 909C0 $$xUS00024$$0252614$$pLCA
000126458 909C0 $$pLCA2$$xU10427$$0252453
000126458 909CO $$qGLOBAL_SET$$pconf$$pIC$$ooai:infoscience.tind.io:126458
000126458 937__ $$aLCA-CONF-2008-072
000126458 973__ $$rREVIEWED$$sPUBLISHED$$aEPFL
000126458 980__ $$aCONF