Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. NNG-Mix: Improving Semi-Supervised Anomaly Detection with Pseudo-Anomaly Generation
 
research article

NNG-Mix: Improving Semi-Supervised Anomaly Detection with Pseudo-Anomaly Generation

Dong, Hao
•
Frusque, Gaëtan  
•
Zhao, Yue
Show more
2024
IEEE Transactions on Neural Networks and Learning Systems

Anomaly detection (AD) is essential in identifying rare and often critical events in complex systems, finding applications in fields such as network intrusion detection, financial fraud detection, and fault detection in infrastructure and industrial systems. While AD is typically treated as an unsupervised learning task due to the high cost of label annotation, it is more practical to assume access to a small set of labeled anomaly samples from domain experts, as is the case for semi-supervised AD. Semi-supervised and supervised approaches can leverage such labeled data, resulting in improved performance. In this article, rather than proposing a new semi-supervised or supervised approach for AD, we introduce a novel algorithm for generating additional pseudo-anomalies on the basis of the limited labeled anomalies and a large volume of unlabeled data. This serves as an augmentation to facilitate the detection of new anomalies. Our proposed algorithm, named nearest neighbor Gaussian mix-up (NNG-Mix), efficiently integrates information from both labeled and unlabeled data to generate pseudo-anomalies. We compare the performance of this novel algorithm with commonly applied augmentation techniques, such as Mixup and Cutout. We evaluate NNG-Mix by training various existing semi-supervised and supervised AD algorithms on the original training data along with the generated pseudo-anomalies. Through extensive experiments on 57 benchmark datasets in ADBench, reflecting different data types, we demonstrate that NNG-Mix outperforms other data augmentation methods. It yields significant performance improvements compared to the baselines trained exclusively on the original training data. Notably, NNG-Mix yields up to 16.4%, 8.8%, and 8.0% improvements on Classical, CV, and NLP datasets in ADBench. Our source code is available at https://github.com/donghao51/NNG-Mix.

  • Details
  • Metrics
Type
research article
DOI
10.1109/TNNLS.2024.3497801
Scopus ID

2-s2.0-85210098335

Author(s)
Dong, Hao

ETH Zürich

Frusque, Gaëtan  

École Polytechnique Fédérale de Lausanne

Zhao, Yue

USC Viterbi School of Engineering

Chatzi, Eleni

ETH Zürich

Fink, Olga  

École Polytechnique Fédérale de Lausanne

Date Issued

2024

Published in
IEEE Transactions on Neural Networks and Learning Systems
Subjects

Anomaly detection (AD)

•

data augmentation

•

mixup

•

nearest neighbors (NNs)

•

semi-supervised learning

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
IMOS  
FunderFunding(s)Grant NumberGrant URL

NSF

POSE-2346158

Available on Infoscience
January 25, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/244032
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés