Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. The value of human data annotation for machine learning based anomaly detection in environmental systems
 
research article

The value of human data annotation for machine learning based anomaly detection in environmental systems

Russo, Stefania
•
Besmer, Michael D.
•
Blumensaat, Frank
Show more
November 1, 2021
Water Research

Anomaly detection is the process of identifying unexpected data samples in datasets. Automated anomaly detection is either performed using supervised machine learning models, which require a labelled dataset for their calibration, or unsupervised models, which do not require labels. While academic research has produced a vast array of tools and machine learning models for automated anomaly detection, the research community focused on environmental systems still lacks a comparative analysis that is simultaneously comprehensive, objective, and systematic. This knowledge gap is addressed for the first time in this study, where 15 different supervised and unsupervised anomaly detection models are evaluated on 5 different environmental datasets from engineered and natural aquatic systems. To this end, anomaly detection performance, labelling efforts, as well as the impact of model and algorithm tuning are taken into account. As a result, our analysis reveals the relative strengths and weaknesses of the different approaches in an objective manner without bias for any particular paradigm in machine learning. Most importantly, our results show that expert-based data annotation is extremely valuable for anomaly detection based on machine learning.

  • Details
  • Metrics
Type
research article
DOI
10.1016/j.watres.2021.117695
Web of Science ID

WOS:000713194100009

Author(s)
Russo, Stefania
Besmer, Michael D.
Blumensaat, Frank
Bouffard, Damien
Disch, Andy
Hammes, Frederik
Hess, Angelika
Lurig, Moritz
Matthews, Blake
Minaudo, Camille  
Show more
Date Issued

2021-11-01

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Published in
Water Research
Volume

206

Article Number

117695

Subjects

Engineering, Environmental

•

Environmental Sciences

•

Water Resources

•

Engineering

•

Environmental Sciences & Ecology

•

machine learning

•

anomaly detection

•

environmental systems

•

labels

•

principal component analysis

•

sequencing batch reactor

•

fault-detection

•

water-quality

•

multivariate

•

regression

•

network

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
APHYS  
Available on Infoscience
December 4, 2021
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/183659
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés