Statistica Sinica

Runze Li, Jingyuan Liu, Lejia Lou
Partial correlation based variable selection method was proposed for normal linear regression models by Bühlmann, Kalisch and Maathuis (2010) as a comparable alternative method to regularization methods for variable selection. This paper addresses two important issues related to partial correlation based variable selection method: (a) whether this method is sensitive to normality assumption, and (b) whether this method is valid when the dimension of predictor increases in an exponential rate of the sample size...
July 2017: Statistica Sinica
Tamar Sofer, Marilyn C Cornelis, Peter Kraft, Eric J Tchetgen Tchetgen
Case-control studies are designed towards studying associations between risk factors and a single, primary outcome. Information about additional, secondary outcomes is also collected, but association studies targeting such secondary outcomes should account for the case-control sampling scheme, or otherwise results may be biased. Often, one uses inverse probability weighted (IPW) estimators to estimate population effects in such studies. IPW estimators are robust, as they only require correct specification of the mean regression model of the secondary outcome on covariates, and knowledge of the disease prevalence...
April 2017: Statistica Sinica
Dan Shen, Haipeng Shen, Hongtu Zhu, J S Marron
The aim of this paper is to establish several deep theoretical properties of principal component analysis for multiple-component spike covariance models. Our new results reveal an asymptotic conical structure in critical sample eigendirections under the spike models with distinguishable (or indistinguishable) eigenvalues, when the sample size and/or the number of variables (or dimension) tend to infinity. The consistency of the sample eigenvectors relative to their population counterparts is determined by the ratio between the dimension and the product of the sample size with the spike size...
October 2016: Statistica Sinica
Esra Kürüm, Runze Li, Saul Shiffman, Weixin Yao
Motivated by an empirical analysis of ecological momentary assessment data (EMA) collected in a smoking cessation study, we propose a joint modeling technique for estimating the time-varying association between two intensively measured longitudinal responses: a continuous one and a binary one. A major challenge in joint modeling these responses is the lack of a multivariate distribution. We suggest introducing a normal latent variable underlying the binary response and factorizing the model into two components: a marginal model for the continuous response, and a conditional model for the binary response given the continuous response...
July 2016: Statistica Sinica
Xu Liu, Yuehua Cui, Runze Li
Gene-environment (G×E) interactions play key roles in many complex diseases. An increasing number of epidemiological studies have shown the combined effect of multiple environmental exposures on disease risk. However, no appropriate statistical models have been developed to conduct a rigorous assessment of such combined effects when G×E interactions are considered. In this paper, we propose a partial linear varying multi-index coefficient model (PLVMICM) to assess how multiple environmental factors act jointly to modify individual genetic risk on complex disease...
July 2016: Statistica Sinica
T Tony Cai, Hongzhe Li, Weidong Liu, Jichun Xie
Motivated by analysis of gene expression data measured in different tissues or disease states, we consider joint estimation of multiple precision matrices to effectively utilize the partially shared graphical structures of the corresponding graphs. The procedure is based on a weighted constrained ℓ∞/ℓ1 minimization, which can be effectively implemented by a second-order cone programming. Compared to separate estimation methods, the proposed joint estimation method leads to estimators converging to the true precision matrices faster...
April 2016: Statistica Sinica
Wei Xiao, Wenbin Lu, Hao Helen Zhang
Time-varying coefficient Cox model has been widely studied and popularly used in survival data analysis due to its flexibility for modeling covariate effects. It is of great practical interest to accurately identify the structure of covariate effects in a time-varying coefficient Cox model, i.e. covariates with null effect, constant effect and truly time-varying effect, and estimate the corresponding regression coefficients. Combining the ideas of local polynomial smoothing and group nonnegative garrote, we develop a new penalization approach to achieve such goals...
April 2016: Statistica Sinica
Chen Xu, Shaobo Lin, Jian Fang, Runze Li
The appearance of massive data has become increasingly common in contemporary scientific research. When sample size n is huge, classical learning methods become computationally costly for the regression purpose. Recently, the orthogonal greedy algorithm (OGA) has been revitalized as an efficient alternative in the context of kernel-based statistical learning. In a learning problem, accurate and fast prediction is often of interest. This makes an appropriate termination crucial for OGA. In this paper, we propose a new termination rule for OGA via investigating its predictive performance...
April 2016: Statistica Sinica
Hana Lee, Michael G Hudgens, Jianwen Cai, Stephen R Cole
A common objective of biomedical cohort studies is assessing the effect of a time-varying treatment or exposure on a survival time. In the presence of time-varying confounders, marginal structural models fit using inverse probability weighting can be employed to obtain a consistent and asymptotically normal estimator of the causal effect of a time-varying treatment. This article considers estimation of parameters in the semiparametric marginal structural Cox model (MSCM) from a case-cohort study. Case-cohort sampling entails assembling covariate histories only for cases and a random subcohort, which can be cost effective, particularly in large cohort studies with low outcome rates...
April 2016: Statistica Sinica
Guangren Yang, Ye Yu, Runze Li, Anne Buu
Survival data with ultrahigh dimensional covariates such as genetic markers have been collected in medical studies and other fields. In this work, we propose a feature screening procedure for the Cox model with ultrahigh dimensional covariates. The proposed procedure is distinguished from the existing sure independence screening (SIS) procedures (Fan, Feng and Wu, 2010, Zhao and Li, 2012) in that the proposed procedure is based on joint likelihood of potential active predictors, and therefore is not a marginal screening procedure...
2016: Statistica Sinica
Wei Zhong, Liping Zhu, Runze Li, Hengjian Cui
We propose both a penalized quantile regression and an independence screening procedure to identify important covariates and to exclude unimportant ones for a general class of ultrahigh dimensional single-index models, in which the conditional distribution of the response depends on the covariates via a single-index structure. We observe that the linear quantile regression yields a consistent estimator of the direction of the index parameter in the single-index model. Such an observation dramatically reduces computational complexity in selecting important covariates in the single-index model...
January 2016: Statistica Sinica
Sunyoung Shin, Jason Fine, Yufeng Liu
In many problems, one has several models of interest that capture key parameters describing the distribution of the data. Partially overlapping models are taken as models in which at least one covariate effect is common to the models. A priori knowledge of such structure enables efficient estimation of all model parameters. However, in practice, this structure may be unknown. We propose adaptive composite M-estimation (ACME) for partially overlapping models using a composite loss function, which is a linear combination of loss functions defining the individual models...
January 2016: Statistica Sinica
Arijit Sinha, Zhiyi Chi, Ming-Hui Chen
Survival data often contain tied event times. Inference without careful treatment of the ties can lead to biased estimates. This paper develops the Bayesian analysis of a stochastic wear process model to fit survival data that might have a large number of ties. Under a general wear process model, we derive the likelihood of parameters. When the wear process is a Gamma process, the likelihood has a semi-closed form that allows posterior sampling to be carried out for the parameters, hence achieving model selection using Bayesian deviance information criterion...
October 2015: Statistica Sinica
Philip S Boonstra, Bhramar Mukherjee, Jeremy M G Taylor
We propose new approaches for choosing the shrinkage parameter in ridge regression, a penalized likelihood method for regularizing linear regression coefficients, when the number of observations is small relative to the number of parameters. Existing methods may lead to extreme choices of this parameter, which will either not shrink the coefficients enough or shrink them by too much. Within this "small-n, large-p" context, we suggest a correction to the common generalized cross-validation (GCV) method that preserves the asymptotic optimality of the original GCV...
July 1, 2015: Statistica Sinica
R Song, W Wang, D Zeng, M R Kosorok
A dynamic treatment regimen incorporates both accrued information and long-term effects of treatment from specially designed clinical trials. As these trials become more and more popular in conjunction with longitudinal data from clinical studies, the development of statistical inference for optimal dynamic treatment regimens is a high priority. In this paper, we propose a new machine learning framework called penalized Q-learning, under which valid statistical inference is established. We also propose a new statistical procedure: individual selection and corresponding methods for incorporating individual selection within penalized Q-learning...
July 2015: Statistica Sinica
Zhao Chen, Runze Li, Yan Li
Varying coefficient model has been popular in the literature. In this paper, we propose a profile least squares estimation procedure to its regression coefficients when its random error is an auto-regressive (AR) process. We further study the asymptotic properties of the proposed procedure, and establish the asymptotic normality for the resulting estimate. We show that the resulting estimate for the regression coefficients has the same asymptotic bias and variance as the local linear estimate for varying coefficient models with independent and identically distributed observations...
April 2015: Statistica Sinica
Lucia Castellanos, Vincent Q Vu, Sagi Perel, Andrew B Schwartz, Robert E Kass
We propose a Multivariate Gaussian Process Factor Model to estimate low dimensional spatio-temporal patterns of finger motion in repeated reach-to-grasp movements. Our model decomposes and reduces the dimensionality of variation of the multivariate functional data. We first account for time variability through multivariate functional registration, then decompose finger motion into a term that is shared among replications and a term that encodes the variation per replication. We discuss variants of our model, estimation algorithms, and we evaluate its performance in simulations and in data collected from a non-human primate executing a reach-to-grasp task...
January 2015: Statistica Sinica
Xinyu Zhang, Guohua Zou, Raymond J Carroll
This paper proposes a model averaging method based on Kullback-Leibler distance under a homoscedastic normal error term. The resulting model average estimator is proved to be asymptotically optimal. When combining least squares estimators, the model average estimator is shown to have the same large sample properties as the Mallows model average (MMA) estimator developed by Hansen (2007). We show via simulations that, in terms of mean squared prediction error and mean squared parameter estimation error, the proposed model average estimator is more efficient than the MMA estimator and the estimator based on model selection using the corrected Akaike information criterion in small sample situations...
2015: Statistica Sinica
Lei Pang, Wenbin Lu, Huixia Judy Wang
In survival analysis, the accelerated failure time model is a useful alternative to the popular Cox proportional hazards model due to its easy interpretation. Current estimation methods for the accelerated failure time model mostly assume independent and identically distributed random errors, but in many applications the conditional variance of log survival times depend on covariates exhibiting some form of heteroscedasticity. In this paper, we develop a local Buckley-James estimator for the accelerated failure time model with heteroscedastic errors...
2015: Statistica Sinica
Xia Wang, Ming-Hui Chen, Rita C Kuo, Dipak K Dey
A Bayesian hierarchical model is developed for count data with spatial and temporal correlations as well as excessive zeros, uneven sampling intensities, and inference on missing spots. Our contribution is to develop a model on zero-inflated count data that provides flexibility in modeling spatial patterns in a dynamic manner and also improves the computational efficiency via dimension reduction. The proposed methodology is of particular importance for studying species presence and abundance in the field of ecological sciences...
January 2015: Statistica Sinica
