Iryna Lobach
Assistant Professor
NY University School of Medicine
Haplotype-Based
Regression Analysis and Inference of Case–Control Studies with Unphased Genotypes and Measurement Errors in Environmental
Exposures
ABSTRACT
It
is widely believed that risks of many complex diseases are determined by
genetic susceptibilities, environmental exposures, and their interaction. Chatterjee and Carroll
(2005, Biometrika92, 399–418) developed an efficient
retrospective maximum-likelihood method for analysis of case–control studies
that exploits an assumption of gene–environment independence and leaves the
distribution of the environmental covariates to be completely nonparametric. Spinka, Carroll, and Chatterjee
(2005, Genetic Epidemiology29, 108–127) extended this approach to
studies where certain types of genetic information, such as haplotype
phases, may be missing on some subjects. We further extend this approach to
situations when some of the environmental exposures are measured with error.
Using a polychotomous logistic regression model, we
allow disease status to have K+ 1 levels. We propose use of a pseudolikelihood and a related
EM algorithm for parameter estimation. We prove consistency and derive the
resulting asymptotic covariance matrix of parameter estimates when the variance
of the measurement error is known and when it is estimated using replications.
Inferences with measurement error corrections are complicated by the fact that
the Wald test often behaves poorly in the presence of large amounts of
measurement error. The likelihood-ratio (LR) techniques are known to be a good
alternative. However, the LR tests are not technically correct in this setting
because the likelihood function is based on an incorrect model, i.e., a
prospective model in a retrospective sampling scheme. We corrected standard
asymptotic results to account for the fact that the LR test is based on a
likelihood-type function. The performance of the proposed method is illustrated
using simulation studies emphasizing the case when genetic information is in
the form of haplotypes and missing data arises from haplotype-phase ambiguity. An application of our method is
illustrated using a population-based case–control study of the association
between calcium intake and the risk of colorectal adenoma.