Developing Predictive Models Using Summary-Level Information from Big Data Sources
Extraction of information through summary-level statistics, as opposed to individual level data, from big datasets can be appealing because of various practical reasons such as data sharing, storage and computing, as well as for ethical reasons, such as maintenance of the privacy of the study subjects and protection of the future research interest of data generating institutions/investigators. In this talk, I will describe statistical methods for building predictive models using summary-level information in two different settings. One involves development of high-dimensional penalized additive regression models using summary-level association statistics from genome-wide association studies and functional/annotation information from other genomic databases. The other application involves building general regression models using individual level data from an analytic study while utilizing information on parameters of a reduced model fitted to an external big dataset. The methods will be illustrated with cutting edge applications of disease risk prediction models in precision medicine.