Multiple testing with test statistics following heavy-tailed distributions

Jiang, Zhiwen

doi:10.5075/epfl-thesis-8102

Jiang, Zhiwen

2021

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

In multiple testing problems where the components come from a mixture model of noise and true effect, we seek to first test for the existence of the non-zero components, and then identify the true alternatives under a fixed significance level $\alpha$. Two parameters, namely the fraction of the non-null components $\varepsilon$ and the size of the effects $\mu$, characterise the two-point mixture model under the global alternative. When the number of hypotheses $m$ goes to infinity, we are interested in an asymptotic framework where the fraction of the non-null components is vanishing, and the true effects need to be sizable to be detected. Donoho and Jin give an explicit form of the asymptotic detectable boundary based on the Gaussian mixture model under the classic calibration of the parameters of the mixture model. We prove the analogous results for the Cauchy mixture distribution as an example heavy-tailed case. This requires a different formulation of the parameters, which reflects the added difficulties. We also propose a multiple testing procedure based on a filtering approach that can discover the true alternatives. Benjamini and Hochberg (BH) compare the observed $p$-values to a linear threshold curve and reject the null hypotheses from the minimum up to the last up-crossing, and prove the false discovery rate (FDR) is controlled. However, there is an intrinsic difference in heavy-tailed settings. Were we to use the BH procedure we would get a highly variable positive false discovery rate (pFDR). In our study we analyse the distribution of the $p$-values and devise a new multiple testing procedure to combine the usual case and the heavy-tailed case based on the empirical properties of the $p$-values. The filtering approach is designed to eliminate most $p$-values that are more likely to be uniform, while preserving most of the true alternatives. Based on the filtered $p$-values, we estimate the mode $\vartheta$ and define the rejection region $\mathscr{R}(\vartheta, \delta)=\left[ \vartheta -\delta/2, \vartheta +\delta/2 \right]$ such that the most informative $p$-values are included. The length $\delta$ is chosen by controlling the data-dependent estimation of FDR at a desired level.

Details

Title Multiple testing with test statistics following heavy-tailed distributions

Author(s) Jiang, Zhiwen

Advisor(s)

Morgenthaler, Stephan

Pagination 124

Date 2021

Publisher Lausanne, EPFL

Keywords

False discovery rate (FDR); filtering; heavy-tailed distribution; local FDR; mode estimation; multiple testing; operating characteristics; positive FDR.

Language English

DOI https://doi.org/10.5075/epfl-thesis-8102

Laboratories STAP

Record Appears in Scientific production and competences > SB - School of Basic Sciences > SB Archives > STAP - Chair of Applied Statistics
Scientific production and competences > SB - School of Basic Sciences > Mathematics
Scientific production and competences > EPFL Theses
Work produced at EPFL
Published
Theses

Record creation date 2021-03-05

Files

Abstract

Details

PDF