Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. A comparative analysis of gene expression profiling by statistical and machine learning approaches
 
research article

A comparative analysis of gene expression profiling by statistical and machine learning approaches

Bontonou, Myriam
•
Haget, Anais  
•
Boulougouri, Maria  
Show more
January 29, 2025
Bioinformatics Advances

Motivation Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of understanding biological processes. For many models, including neural networks, interpretations are lists of genes ranked by their importance for the predictions, with top-ranked genes likely linked to the phenotype. In this article, we discuss the limitations of such approaches using integrated gradient, an explainability method developed for neural networks, as an example.Results Experiments are performed on RNA sequencing data from public cancer databases. A collection of ML models, including multilayer perceptrons and graph neural networks, are trained to classify samples by cancer type. Gene rankings from integrated gradients are compared to genes highlighted by statistical feature selection methods such as DESeq2 and other learning methods measuring global feature contribution. Experiments show that a small set of top-ranked genes is sufficient to achieve good classification. However, similar performance is possible with lower-ranked genes, although larger sets are required. Moreover, significant differences in top-ranked genes, especially between statistical and learning methods, prevent a comprehensive biological understanding. In conclusion, while these methods identify pathology-specific biomarkers, the completeness of gene sets selected by explainability techniques for understanding biological processes remains uncertain.Availability and implementation Python code and datasets are available at https://github.com/mbonto/XAI_in_genomics.

  • Files
  • Details
  • Metrics
Type
research article
DOI
10.1093/bioadv/vbae199
Web of Science ID

WOS:001410029700001

PubMed ID

39897946

Author(s)
Bontonou, Myriam

CHU Lyon

Haget, Anais  

École Polytechnique Fédérale de Lausanne

Boulougouri, Maria  

École Polytechnique Fédérale de Lausanne

Audit, Benjamin

Ecole Normale Superieure de Lyon (ENS de LYON)

Borgnat, Pierre

Ecole Normale Superieure de Lyon (ENS de LYON)

Arbona, Jean-Michel

CHU Lyon

Date Issued

2025-01-29

Publisher

OXFORD UNIV PRESS

Published in
Bioinformatics Advances
Volume

5

Issue

1

Article Number

vbae199

Subjects

MOLECULAR CLASSIFICATION

•

FEATURE-SELECTION

•

CANCER

•

PREDICTION

•

Science & Technology

•

Life Sciences & Biomedicine

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
LTS2  
FunderFunding(s)Grant NumberGrant URL

CHIST-ERA grant

CHIST- ERA-19-XAI-006

Agence Nationale de la Recherche (ANR)

GRAPHNEX ANR-21-CHR4-0009

Available on Infoscience
February 10, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/246727
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés