Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Journal articles
  4. Blacklisting variants common in private cohorts but not in public databases optimizes human exome analysis
 
research article

Blacklisting variants common in private cohorts but not in public databases optimizes human exome analysis

Maffucci, Patrick
•
Bigio, Benedetta
•
Rapaport, Franck
Show more
January 15, 2019
Proceedings Of The National Academy Of Sciences Of The United States Of America (PNAS)

Computational analyses of human patient exomes aim to filter out as many nonpathogenic genetic variants (NPVs) as possible, without removing the true disease-causing mutations. This involves comparing the patient's exome with public databases to remove reported variants inconsistent with disease prevalence, mode of inheritance, or clinical penetrance. However, variants frequent in a given exome cohort, but absent or rare in public databases, have also been reported and treated as NPVs, without rigorous exploration. We report the generation of a blacklist of variants frequent within an in-house cohort of 3,104 exomes. This blacklist did not remove known pathogenic mutations from the exomes of 129 patients and decreased the number of NPVs remaining in the 3,104 individual exomes by a median of 62%. We validated this approach by testing three other independent cohorts of 400, 902, and 3,869 exomes. The blacklist generated from any given cohort removed a substantial proportion of NPVs (11-65%). We analyzed the blacklisted variants computationally and experimentally. Most of the blacklisted variants corresponded to false signals generated by incomplete reference genome assembly, location in low-complexity regions, bioinformatic misprocessing, or limitations inherent to cohort-specific private alleles (e.g., due to sequencing kits, and genetic ancestries). Finally, we provide our precalculated blacklists, together with ReFiNE, a program for generating customized blacklists from any medium-sized or large in-house cohort of exome (or other next-generation sequencing) data via a user-friendly public web server. This work demonstrates the power of extracting variant blacklists from private databases as a specific in-house but broadly applicable tool for optimizing exome analysis.

  • Details
  • Metrics
Type
research article
DOI
10.1073/pnas.1808403116
Web of Science ID

WOS:000455610300037

Author(s)
Maffucci, Patrick
•
Bigio, Benedetta
•
Rapaport, Franck
•
Cobat, Aurelie
•
Borghesi, Alessandro  
•
Lopez, Marie
•
Pating, Etienne
•
Bolze, Alexandre
•
Shang, Lei
•
Bendavid, Matthieu
Show more
Date Issued

2019-01-15

Publisher

National Academy of Sciences

Published in
Proceedings Of The National Academy Of Sciences Of The United States Of America (PNAS)
Volume

116

Issue

3

Start page

950

End page

959

Subjects

Multidisciplinary Sciences

•

Science & Technology - Other Topics

•

exome

•

variant

•

blacklist

•

wes analysis

•

wes annotation

•

gene mutation database

•

genome

•

mononucleotide

•

guidelines

•

framework

•

disease

•

errors

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
UPFELLAY  
Available on Infoscience
January 26, 2019
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/154147
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés