Constructing a tree from homeomorphic subtrees, with applications to computational evolutionary biology
We are given a set script T = (T1, T2, . . . , Tk) of rooted binary trees, each Ti leaf-labeled by a subset L[Ti] C (1,2, . . . , n). If T is a tree on (1, 2, . . . , n), we let T|L denote the minimal subtree of T induced by the nodes of L and all their ancestors. The consensus tree problem asks whether there exists a tree T* such that, for every i, T*|L[Ti] is homeomorphic to Ti. We present algorithms which test if a given sel of trees has a consensus tree and if so. construct one. The deterministic algorithm takes time min(O (N.sqrt(n)), O (N + pow2(n).log(n)), where N = sum[i]( |Ti| ), and uses linear space. The randomized algorithm takes time O (N log3 n) and uses linear space. The previous best for this problem was a 1981 O (N n) algorithm by Aho et al. Our faster deterministic algorithm uses a new efficient algorithm for the following interesting dynamic graph problem: Given a graph G with n nodes and m edges and a sequence of b batches of one or more edge deletions, then, after each batch, either find a new component that has just been created or determine that there is no such component. For this problem, we have a simple algorithm with running time O (n2 log n + b0 min(n2, m log n)), where b0 is the number of batches which do not result in a new component. For our particular application b0 <= 1. If all edges are deleted, then the best previously known deterministic algorithm requires time O (m.sqrt(n)) to solve this problem. We also present two applications of these consensus tree algorithms which solve other problems in computational evolutionary biology.