2016 TAMU-RUC Joint Young Researchers Workshop in Statistics

2016 TAMU-RUC Joint Young Researchers Workshop in Statistics

January 14, 2016, Department of Statistics, Texas A&M University

Location:  Blocker 448

Organizing Committee:  Chunrong Ai (chair), Feifang Hu, Jianhua Huang

No registration is needed.




Efficient computation of smoothing splines for large data sets

Nan Zhang

Texas A&M University

In this talk, I will first review my research projects and future plan which include three topics: computation for big data; regularized functional regression models for neuroimaging data analysis; conditional density estimation with application in wind energy industry.

The second part is on efficient computation of smoothing splines via adaptive basis sampling. Smoothing splines provide flexible nonparametric regression estimators. However, the high computational cost has hindered their wide application for large data sets. We propose adaptive basis sampling method for efficient computation of smoothing splines in large samples. The adaptive sampling scheme uses values of the response variable to select a smaller set of basis functions by which the smoothing spline estimator is approximated. Our asymptotic analysis shows that smoothing splines computed via adaptive basis sampling converge to the true function at the same rate as full basis smoothing splines. Using simulation studies and a large-scale deep earth core-mantle boundary imaging study, we show that the proposed method outperforms a sampling method that does not use the values of response variable.


Estimation of a Binary Choice Game Model with Network Links

Xingbai Xu

Ohio State University

This paper studies the simulated moment estimation of a binary choice game model with network links, where the network peer effects are non-negative, and there might be only one or few networks in the sample. The proposed estimation method can be applied to studies with binary dependent variables in the fields of empirical IO, social network and spatial econometrics. The model might have multiple Nash equilibria. We assume that the maximum Nash equilibrium, which always exists and is strongly coalition-proof and Pareto optimal, is selected. The challenging econometric issues are the possible correlation among all dependent variables and the discontinuous functional form of our simulated moments. We overcome these challenges via the empirical process theory and derive the spatial NED of the dependent variable. We establish a criterion for an NED random field to be stochastically equicontinuous and we apply it to develop the consistency and asymptotic normality of the estimator. We examine computational issues and finite sample properties of the simulated moment method by some Monte Carlo experiments.


Set-based Tests for Genetic Association and Gene-Environment interaction

Zihuai He

University of Michigan

Characterizing genetic association and gene-environment interaction related to complex diseases has received considerable attention in the last decade. To reduce the burden of multiple comparisons and improve power, many genetic association studies have now considered an alternate or supplementary analytic approach towards jointly testing the effect of all genetic variants in a biologically defined set, such as a gene, pathway or specific genome region as opposed to a one-at-a-time single variant analysis. In this talk, I will first introduce a set of set-based tests for genetic association in both cross-sectional and longitudinal studies. Then I will introduce a statistical framework for set-based inference for gene-environment interaction with longitudinally measured quantitative traits and propose a generalized score type test. The test is robust to misspecification of within subject correlation and has enhanced power compared to existing alternatives, particularly in presence of a temporal trend and modest number of repeated measures. Unlike tests for marginal genetic association, set-based tests for gene-environment interaction face the challenges of a potentially misspecified and high-dimensional main effect model. We show that the proposed test is robust to main effect misspecification of an environmental exposure and genetic variables under the gene-environment independence condition. When genetic and environmental factors are related, the method of sieves is further proposed to eliminate potential bias due to a misspecified main effect of a continuous environmental exposure. A weighted principal component analysis approach is developed to perform dimension reduction when the number of genetic variants in the set is large relative to the sample size. These general issues are also relevant to a cross-sectional analysis.


Semiparametric linear transformation model with measurement error and validation sampling

Xuan Wang

University of Washington

For the semiparametric linear transformation model with covariate measurement error and validation sampling, we propose an estimation method to estimate the covariate coefficient. The method updates the validation set based estimator to get a more efficient estimator using the data information available on the whole cohort. It can be used to deal with both differential and nondifferential measurement error. Consistency and asymptotic normality are established for the proposed estimator and a closed form formula is derived for the limiting variance-covariance matrix. Simulation studies and a real data analysis are used to illustrate the performances of the proposed method.

For additional information please contact Deanna Stevens. deanna@stat.tamu.edu