Fast and accurate phylogenetic reconstruction from high-resolution whole-genome data and a novel robustness estimator

Lin, Yu; Rajan, Vaibhav; Moret, Bernard

doi:10.1007/978-3-642-16181-0_12

conference paper

Fast and accurate phylogenetic reconstruction from high-resolution whole-genome data and a novel robustness estimator

Lin, Yu

•

Rajan, Vaibhav

•

Moret, Bernard

2010

Comparative Genomics. RECOMB-CG 2010

8th RECOMB Workshop on Comparative Genomics RECOMB-CG'10

The rapid accumulation of whole-genome data has renewed interest in the study of genomic rearrangements. Comparative genomics, evolutionary biology, and cancer research all require models and algorithms to elucidate the mechanisms, history, and consequences of these rearrangements. However, even simple models lead to NP-hard problems, particularly in the area of phylogenetic analysis. Current approaches are limited to small collections of genomes and low-resolution data (typically a few hundred syntenic blocks). Moreover, whereas phylogenetic analyses from sequence data are deemed incomplete unless bootstrapping scores (a measure of confidence) are given for each tree edge, no equivalent to bootstrapping exists for rearrangement-based phylogenetic analysis. We describe a fast and accurate algorithm for rearrangement analysis that scales up, in both time and accuracy, to modern high-resolution genomic data. We also describe a novel approach to estimate the robustness of results-an equivalent to the bootstrapping analysis used in sequence-based phylogenetic reconstruction. We present the results of extensive testing on both simulated and real data showing that our algorithm returns very accurate results, while scaling linearly with the size of the genomes and cubically with their number. We also present extensive experimental results showing that our approach to robustness testing provides excellent estimates of confidence, which, moreover, can be tuned to trade off thresholds between false positives and false negatives. Together, these two novel approaches enable us to attack heretofore intractable problems, such as phylogenetic inference for high-resolution vertebrate genomes, as we demonstrate on a set of six vertebrate genomes with 8,380 syntenic blocks. A copy of the software is available on demand.

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/54746

Type

conference paper

DOI

10.1007/978-3-642-16181-0_12

Web of Science ID

WOS:000289457100012

Authors

Lin, Yu

•

Rajan, Vaibhav

•

Moret, Bernard

Publication date

2010

Publisher

Springer

Published in

Comparative Genomics. RECOMB-CG 2010

ISBN of the book

978-3-642-16180-3

Series title/Series vol.

Lecture Notes in Computer Science; 6398

Start page

137

End page

148

Subjects

algorithms

combinatorial optimiz...

computational molecul...

genomic rearrangement...

phylogenetic analyses...

bootstrapping

True Evolutionary Dis...

Gene-Order Data

Placental Mammals

Bootstrap

Rearrangements

Trees

Peer reviewed

REVIEWED

EPFL units

LCBB

Event name	Event date
8th RECOMB Workshop on Comparative Genomics RECOMB-CG'10	2010

Available on Infoscience

October 2, 2010