Safe density ratio modeling
An important problem in logistic regression modeling is the existence of the maximum likelihood estimators. In particular, when the sample size is small, the maximum likelihood estimator of the regression parameters does not exist if the data are completely, or quasicompletely separated. Recognizing that this phenomenon has a serious impact on the fitting of the density ratio model - which is a semiparametric model whose profile empirical log-likelihood has the logistic form because of the equivalence between prospective and retrospective sampling - we suggest a linear programming methodology for examining whether the maximum likelihood estimators of the finite dimensional parameter vector of the model exist. It is shown that the methodology can be effectively utilized in the analysis of case-control gene expression data by identifying cases where the density ratio model cannot be applied. It is demonstrated that naive application of the density ratio model yields erroneous conclusions. (C) 2009 Elsevier B.V. All rights reserved.