New Genome Similarity Measures based on Conserved Gene Adjacencies

Doerr, Daniel; Kowada, Luis Antonio B.; Araujo, Eloi; Deshpande, Shachi; Dantas, Simone; Moret, Bernard M. E.; Stoye, Jens

doi:10.1089/cmb.2017.0065

research article

New Genome Similarity Measures based on Conserved Gene Adjacencies

Doerr, Daniel

•

Kowada, Luis Antonio B.

•

Araujo, Eloi

2017

Journal Of Computational Biology

Many important questions in molecular biology, evolution, and biomedicine can be addressed by comparative genomic approaches. One of the basic tasks when comparing genomes is the definition of measures of similarity (or dissimilarity) between two genomes, for example, to elucidate the phylogenetic relationships between species. The power of different genome comparison methods varies with the underlying formal model of a genome. The simplest models impose the strong restriction that each genome under study must contain the same genes, each in exactly one copy. More realistic models allow several copies of a gene in a genome. One speaks of gene families, and comparative genomic methods that allow this kind of input are called gene family-based. The most powerfulbut also most complexmodels avoid this preprocessing of the input data and instead integrate the family assignment within the comparative analysis. Such methods are called gene family-free. In this article, we study an intermediate approach between family-based and family-free genomic similarity measures. Introducing this simpler model, called gene connections, we focus on the combinatorial aspects of gene family-free genome comparison. While in most cases, the computational costs to the general family-free case are the same, we also find an instance where the gene connections model has lower complexity. Within the gene connections model, we define three variants of genomic similarity measures that have different expression powers. We give polynomial-time algorithms for two of them, while we show NP-hardness for the third, most powerful one. We also generalize the measures and algorithms to make them more robust against recent local disruptions in gene order. Our theoretical findings are supported by experimental results, proving the applicability and performance of our newly defined similarity measures.

Type

research article

DOI

10.1089/cmb.2017.0065

Web of Science ID

WOS:000402997500014

Author(s)

Doerr, Daniel

Kowada, Luis Antonio B.

Araujo, Eloi

Deshpande, Shachi

Dantas, Simone

Moret, Bernard M. E.

Stoye, Jens

Date Issued

2017

Publisher

Mary Ann Liebert

Published in

Journal Of Computational Biology

Volume

24

Issue

6

Start page

616

End page

634

Subjects

family-free genome comparison

•

gene connections

•

genome rearrangements

•

genome similarity measure

•

conserved adjacencies

Editorial or Peer reviewed

REVIEWED

Written at

EPFL

EPFL units

LCBB

Available on Infoscience

July 10, 2017

Use this identifier to reference this record

https://infoscience.epfl.ch/handle/20.500.14299/139035