Read by QxMD icon Read

Annals of Statistics

Tianqi Zhao, Guang Cheng, Han Liu
We consider a partially linear framework for modelling massive heterogeneous data. The major goal is to extract common features across all sub-populations while exploring heterogeneity of each sub-population. In particular, we propose an aggregation type estimator for the commonality parameter that possesses the (non-asymptotic) minimax optimal bound and asymptotic distribution as if there were no heterogeneity. This oracular result holds when the number of sub-populations does not grow too fast. A plug-in estimator for the heterogeneity parameter is further constructed, and shown to possess the asymptotic distribution as if the commonality information were available...
August 2016: Annals of Statistics
Holger Dette, Kirsten Schorning
We consider the optimal design problem for a comparison of two regression curves, which is used to establish the similarity between the dose response relationships of two groups. An optimal pair of designs minimizes the width of the confidence band for the difference between the two regression functions. Optimal design theory (equivalence theorems, efficiency bounds) is developed for this non standard design problem and for some commonly used dose response models optimal designs are found explicitly. The results are illustrated in several examples modeling dose response relationships...
June 2016: Annals of Statistics
Hongcheng Liu, Tao Yao, Runze Li
This paper is concerned with solving nonconvex learning problems with folded concave penalty. Despite that their global solutions entail desirable statistical properties, there lack optimization techniques that guarantee global optimality in a general setting. In this paper, we show that a class of nonconvex learning problems are equivalent to general quadratic programs. This equivalence facilitates us in developing mixed integer linear programming reformulations, which admit finite algorithms that find a provably global optimal solution...
April 2016: Annals of Statistics
Xianchao Xie, S C Kou, Lawrence Brown
This paper discusses the simultaneous inference of mean parameters in a family of distributions with quadratic variance function. We first introduce a class of semi-parametric/parametric shrinkage estimators and establish their asymptotic optimality properties. Two specific cases, the location-scale family and the natural exponential family with quadratic variance function, are then studied in detail. We conduct a comprehensive simulation study to compare the performance of the proposed methods with existing shrinkage estimators...
March 1, 2016: Annals of Statistics
Holger Dette, Andrey Pepelyshev, Anatoly Zhigljavsky
This paper discusses the problem of determining optimal designs for regression models, when the observations are dependent and taken on an interval. A complete solution of this challenging optimal design problem is given for a broad class of regression models and covariance kernels. We propose a class of estimators which are only slightly more complicated than the ordinary least-squares estimators. We then demonstrate that we can design the experiments, such that asymptotically the new estimators achieve the same precision as the best linear unbiased estimator computed for the whole trajectory of the process...
February 2016: Annals of Statistics
Jianqing Fan, Yuan Liao, Weichen Wang
This paper introduces a Projected Principal Component Analysis (Projected-PCA), which employees principal component analysis to the projected (smoothed) data matrix onto a given linear space spanned by covariates. When it applies to high-dimensional factor analysis, the projection removes noise components. We show that the unobserved latent factors can be more accurately estimated than the conventional PCA if the projection is genuine, or more precisely, when the factor loading matrices are related to the projected linear space...
February 2016: Annals of Statistics
Rui Song, Moulinath Banerjee, Michael R Kosorok
Change-point models are widely used by statisticians to model drastic changes in the pattern of observed data. Least squares/maximum likelihood based estimation of change-points leads to curious asymptotic phenomena. When the change-point model is correctly specified, such estimates generally converge at a fast rate (n) and are asymptotically described by minimizers of a jump process. Under complete mis-specification by a smooth curve, i.e. when a change-point model is fitted to data described by a smooth curve, the rate of convergence slows down to n(1/3) and the limit distribution changes to that of the minimizer of a continuous Gaussian process...
February 2016: Annals of Statistics
Jinyuan Chang, Cheng Yong Tang, Yichao Wu
We consider an independence feature screening technique for identifying explanatory variables that locally contribute to the response variable in high-dimensional regression analysis. Without requiring a specific parametric form of the underlying data model, our approach accommodates a wide spectrum of nonparametric and semiparametric model families. To detect the local contributions of explanatory variables, our approach constructs empirical likelihood locally in conjunction with marginal nonparametric regressions...
2016: Annals of Statistics
Holger Dette, Viatcheslav B Melas, Roman Guchenko
The problem of constructing Bayesian optimal discriminating designs for a class of regression models with respect to the T-optimality criterion introduced by Atkinson and Fedorov (1975a) is considered. It is demonstrated that the discretization of the integral with respect to the prior distribution leads to locally T-optimal discriminating design problems with a large number of model comparisons. Current methodology for the numerical construction of discrimination designs can only deal with a few comparisons, but the discretization of the Bayesian prior easily yields to discrimination design problems for more than 100 competing models...
October 2015: Annals of Statistics
Qi Zheng, Limin Peng, Xuming He
Quantile regression has become a valuable tool to analyze heterogeneous covaraite-response associations that are often encountered in practice. The development of quantile regression methodology for high dimensional covariates primarily focuses on examination of model sparsity at a single or multiple quantile levels, which are typically prespecified ad hoc by the users. The resulting models may be sensitive to the specific choices of the quantile levels, leading to difficulties in interpretation and erosion of confidence in the results...
October 1, 2015: Annals of Statistics
M A Shujie, Raymond J Carroll, Hua Liang, Shizhong Xu
In the low-dimensional case, the generalized additive coefficient model (GACM) proposed by Xue and Yang [Statist. Sinica16 (2006) 1423-1446] has been demonstrated to be a powerful tool for studying nonlinear interaction effects of variables. In this paper, we propose estimation and inference procedures for the GACM when the dimension of the variables is high. Specifically, we propose a groupwise penalization based procedure to distinguish significant covariates for the "large p small n" setting. The procedure is shown to be consistent for model structure identification...
October 2015: Annals of Statistics
Rajarshi Mukherjee, Natesh S Pillai, Xihong Lin
In this paper, we study the detection boundary for minimax hypothesis testing in the context of high-dimensional, sparse binary regression models. Motivated by genetic sequencing association studies for rare variant effects, we investigate the complexity of the hypothesis testing problem when the design matrix is sparse. We observe a new phenomenon in the behavior of detection boundary which does not occur in the case of Gaussian linear regression. We derive the detection boundary as a function of two components: a design matrix sparsity index and signal strength, each of which is a function of the sparsity of the alternative...
February 2015: Annals of Statistics
Jianqing Fan, Philippe Rigollet, Weichen Wang
High-dimensional statistical tests often ignore correlations to gain simplicity and stability leading to null distributions that depend on functionals of correlation matrices such as their Frobenius norm and other ℓ r norms. Motivated by the computation of critical values of such tests, we investigate the difficulty of estimation the functionals of sparse correlation matrices. Specifically, we show that simple plug-in procedures based on thresholded estimators of correlation matrices are sparsity-adaptive and minimax optimal over a large class of correlation matrices...
2015: Annals of Statistics
Jianqing Fan, Zheng Tracy Ke, Han Liu, Lucy Xia
We propose a novel Rayleigh quotient based sparse quadratic dimension reduction method-named QUADRO (Quadratic Dimension Reduction via Rayleigh Optimization)-for analyzing high-dimensional data. Unlike in the linear setting where Rayleigh quotient optimization coincides with classification, these two problems are very different under nonlinear settings. In this paper, we clarify this difference and show that Rayleigh quotient optimization may be of independent scientific interests. One major challenge of Rayleigh quotient optimization is that the variance of quadratic statistics involves all fourth cross-moments of predictors, which are infeasible to compute for high-dimensional applications and may accumulate too many stochastic errors...
2015: Annals of Statistics
Gourab Mukherjee, Iain M Johnstone
We consider estimating the predictive density under Kullback-Leibler loss in an ℓ 0 sparse Gaussian sequence model. Explicit expressions of the first order minimax risk along with its exact constant, asymptotically least favorable priors and optimal predictive density estimates are derived. Compared to the sparse recovery results involving point estimation of the normal mean, new decision theoretic phenomena are seen. Suboptimal performance of the class of plug-in density estimates reflects the predictive nature of the problem and optimal strategies need diversification of the future risk...
2015: Annals of Statistics
Fei Jiang, Yanyuan Ma, Yuanjia Wang
We propose a generalized partially linear functional single index risk score model for repeatedly measured outcomes where the index itself is a function of time. We fuse the nonparametric kernel method and regression spline method, and modify the generalized estimating equation to facilitate estimation and inference. We use local smoothing kernel to estimate the unspecified coefficient functions of time, and use B-splines to estimate the unspecified function of the single index component. The covariance structure is taken into account via a working model, which provides valid estimation and inference procedure whether or not it captures the true covariance...
2015: Annals of Statistics
By Tracy Ke, Jiashun Jin, Jianqing Fan
Consider a linear model Y = X β + z, where X = Xn,p and z ~ N(0, In ). The vector β is unknown and it is of interest to separate its nonzero coordinates from the zero ones (i.e., variable selection). Motivated by examples in long-memory time series (Fan and Yao, 2003) and the change-point problem (Bhattacharya, 1994), we are primarily interested in the case where the Gram matrix G = X'X is non-sparse but sparsifiable by a finite order linear filter. We focus on the regime where signals are both rare and weak so that successful variable selection is very challenging but is still possible...
November 1, 2014: Annals of Statistics
William Fithian, Trevor Hastie
For classification problems with significant class imbalance, subsampling can reduce computational costs at the price of inflated variance in estimating model parameters. We propose a method for subsampling efficiently for logistic regression by adjusting the class balance locally in feature space via an accept-reject scheme. Our method generalizes standard case-control sampling, using a pilot estimate to preferentially select examples whose responses are conditionally rare given their features. The biased subsampling is corrected by a post-hoc analytic adjustment to the parameters...
October 1, 2014: Annals of Statistics
Jianqing Fan, Lingzhou Xue, Hui Zou
Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it is not clear whether the local optimum computed by a given optimization algorithm possesses those nice theoretical properties. To close this important theoretical gap in over a decade, we provide a unified theory to show explicitly how to obtain the oracle solution via the local linear approximation algorithm...
June 2014: Annals of Statistics
Jianqing Fan, Yuan Liao
Most papers on high-dimensional statistics are based on the assumption that none of the regressors are correlated with the regression error, namely, they are exogenous. Yet, endogeneity can arise incidentally from a large pool of regressors in a high-dimensional regression. This causes the inconsistency of the penalized least-squares method and possible false scientific discoveries. A necessary condition for model selection consistency of a general class of penalized regression methods is given, which allows us to prove formally the inconsistency claim...
June 1, 2014: Annals of Statistics
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"