On Optimal Two Sample Homogeneity Tests for Finite Alphabets

Unnikrishnan, Jayakrishnan

doi:10.1109/ISIT.2012.6283716

Unnikrishnan, Jayakrishnan

2012

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Suppose we are given two independent strings of data from a known finite alphabet. We are interested in testing the null hypothesis that both the strings were drawn from the same distribution, assuming that the samples within each string are mutually independent. Among statisticians, the most popular solution for such a homogeneity test is the two sample chi-square test, primarily due to its ease of implementation and the fact that the limiting null hypothesis distribution of the associated test statistic is known and easy to compute. Although tests that are asymptotically optimal in error probability have been proposed in the information theory literature, such optimality results are not well-known and such tests are rarely used in practice. In this paper we seek to bridge the gap between theory and practice. We study two different optimal tests proposed by Shayevitz [1] and Gutman [2]. We first obtain a simplified structure of Shayevitz’s test and then obtain limiting distributions of the test statistics used in both the tests. These results provide guidelines for choosing thresholds that guarantee an approximate false alarm constraint for finite length observation sequences, thus making these tests easy to use in practice. The approximation accuracies are demonstrated using simulations. We argue that such homogeneity tests with provable optimality properties could potentially be better choices than the chi-square test in practice.

Details

Title On Optimal Two Sample Homogeneity Tests for Finite Alphabets

Author(s) Unnikrishnan, Jayakrishnan

Published in Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on

Pagination 5

Series IEEE International Symposium on Information Theory

Pages 2027 -2031

Conference 2012 IEEE International Symposium on Information Theory (ISIT), Boston, Massachusetts, USA, July 1-6 2012

Date 2012

Publisher New York, Ieee

ISBN 978-1-4673-2579-0

Keywords

Hypothesis testing; homogeneity tests; p-value

DOI https://doi.org/10.1109/ISIT.2012.6283716

Other identifier(s) View record in Web of Science

Laboratories LCAV

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > LCAV - Audio Visual Communications Laboratory
Scientific production and competences > Euler Center for Signal Processing
Peer-reviewed publications
Conference Papers
Work produced at EPFL
Published

Record creation date 2012-10-17

Files

Abstract

Details

PDF