Journal of the American Statistical Association

Nilanjan Chatterjee, Yi-Hau Chen, Paige Maas, Raymond J Carroll
Information from various public and private data sources of extremely large sample sizes are now increasingly available for research purposes. Statistical methods are needed for utilizing information from such big data sources while analyzing data from individual studies that may collect more detailed information required for addressing specific hypotheses of interest. In this article, we consider the problem of building regression models based on individual-level data from an "internal" study while utilizing summary-level information, such as information on parameters for reduced models, from an "external" big data source...
March 2016: Journal of the American Statistical Association
Mengjie Chen, Zhao Ren, Hongyu Zhao, Harrison Zhou
A tuning-free procedure is proposed to estimate the covariate-adjusted Gaussian graphical model. For each finite subgraph, this estimator is asymptotically normal and efficient. As a consequence, a confidence interval can be obtained for each edge. The procedure enjoys easy implementation and efficient computation through parallel estimation on subgraphs or edges. We further apply the asymptotic normality result to perform support recovery through edge-wise adaptive thresholding. This support recovery procedure is called ANTAC, standing for Asymptotically Normal estimation with Thresholding after Adjusting Covariates...
March 2016: Journal of the American Statistical Association
Michalis K Titsias, Christopher C Holmes, Christopher Yau
Hidden Markov models (HMMs) are one of the most widely used statistical methods for analyzing sequence data. However, the reporting of output from HMMs has largely been restricted to the presentation of the most-probable (MAP) hidden state sequence, found via the Viterbi algorithm, or the sequence of most probable marginals using the forward-backward algorithm. In this article, we expand the amount of information we could obtain from the posterior distribution of an HMM by introducing linear-time dynamic programming recursions that, conditional on a user-specified constraint in the number of segments, allow us to (i) find MAP sequences, (ii) compute posterior probabilities, and (iii) simulate sample paths...
January 2, 2016: Journal of the American Statistical Association
Ci-Ren Jiang, John A D Aston, Jane-Ling Wang
Positron emission tomography (PET) is an imaging technique which can be used to investigate chemical changes in human biological processes such as cancer development or neurochemical reactions. Most dynamic PET scans are currently analyzed based on the assumption that linear first-order kinetics can be used to adequately describe the system under observation. However, there has recently been strong evidence that this is not the case. To provide an analysis of PET data which is free from this compartmental assumption, we propose a nonparametric deconvolution and analysis model for dynamic PET data based on functional principal component analysis...
January 2, 2016: Journal of the American Statistical Association
Chen Yue, Vadim Zipunnikov, Pierre-Louis Bazin, Dzung Pham, Daniel Reich, Ciprian Crainiceanu, Brian Caffo
In this manuscript, we are concerned with data generated from a diffusion tensor imaging (DTI) experiment. The goal is to parameterize manifold-like white matter tracts, such as the corpus callosum, using principal surfaces. The problem is approached by finding a geometrically motivated surface-based representation of the corpus callosum and visualized fractional anisotropy (FA) values projected onto the surface. The method also applies to any other diffusion summary. An algorithm is proposed that 1) constructs the principal surface of a corpus callosum; 2) flattens the surface into a parametric 2D map; 3) projects associated FA values on the map...
2016: Journal of the American Statistical Association
Jacob Bien, Florentina Bunea, Luo Xiao
We introduce a new sparse estimator of the covariance matrix for high-dimensional models in which the variables have a known ordering. Our estimator, which is the solution to a convex optimization problem, is equivalently expressed as an estimator which tapers the sample covariance matrix by a Toeplitz, sparsely-banded, data-adaptive matrix. As a result of this adaptivity, the convex banding estimator enjoys theoretical optimality properties not attained by previous banding or tapered estimators. In particular, our convex banding estimator is minimax rate adaptive in Frobenius and operator norms, up to log factors, over commonly-studied classes of covariance matrices, and over more general classes...
2016: Journal of the American Statistical Association
Tianxi Cai, T Tony Cai, Anru Zhang
Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed...
2016: Journal of the American Statistical Association
Yanxun Xu, Peter Müller, Abdus S Wahed, Peter F Thall
We analyze a dataset arising from a clinical trial involving multi-stage chemotherapy regimes for acute leukemia. The trial design was a 2 × 2 factorial for frontline therapies only. Motivated by the idea that subsequent salvage treatments affect survival time, we model therapy as a dynamic treatment regime (DTR), that is, an alternating sequence of adaptive treatments or other actions and transition times between disease states. These sequences may vary substantially between patients, depending on how the regime plays out...
2016: Journal of the American Statistical Association
Stanislav Minsker, Ying-Qi Zhao, Guang Cheng
Individualized treatment rules (ITRs) tailor treatments according to individual patient characteristics. They can significantly improve patient care and are thus becoming increasingly popular. The data collected during randomized clinical trials are often used to estimate the optimal ITRs. However, these trials are generally expensive to run, and, moreover, they are not designed to efficiently estimate ITRs. In this article, we propose a cost-effective estimation method from an active learning perspective. In particular, our method recruits only the "most informative" patients (in terms of learning the optimal ITRs) from an ongoing clinical trial...
2016: Journal of the American Statistical Association
Lin Zhang, Veerabhadran Baladandayuthapani, Hongxiao Zhu, Keith A Baggerly, Tadeusz Majewski, Bogdan A Czerniak, Jeffrey S Morris
We develop a functional conditional autoregressive (CAR) model for spatially correlated data for which functions are collected on areal units of a lattice. Our model performs functional response regression while accounting for spatial correlations with potentially nonseparable and nonstationary covariance structure, in both the space and functional domains. We show theoretically that our construction leads to a CAR model at each functional location, with spatial covariance parameters varying and borrowing strength across the functional domain...
2016: Journal of the American Statistical Association
Laura Forastiere, Fabrizia Mealli, Tyler J VanderWeele
Exploration of causal mechanisms is often important for researchers and policymakers to understand how an intervention works and how it can be improved. This task can be crucial in clustered encouragement designs (CED). Encouragement design studies arise frequently when the treatment cannot be enforced because of ethical or practical constrains and an encouragement intervention (information campaigns, incentives, etc) is conceived with the purpose of increasing the uptake of the treatment of interest. By design, encouragements always entail the complication of non-compliance...
2016: Journal of the American Statistical Association
Jingxiang Chen, Yufeng Liu, Donglin Zeng, Rui Song, Yingqi Zhao, Michael R Kosorok
Xu, Müller, Wahed, and Thall proposed a Bayesian model to analyze an acute leukemia study involving multi-stage chemotherapy regimes. We discuss two alternative methods, Q-learning and O-learning, to solve the same problem from the machine learning point of view. The numerical studies show that these methods can be flexible and have advantages in some situations to handle treatment heterogeneity while being robust to model misspecification.
2016: Journal of the American Statistical Association
Qian Guan, Eric B Laber, Brian J Reich
No abstract text is available yet for this article.
2016: Journal of the American Statistical Association
Tyler H McCormick, Zehang Richard Li, Clara Calvert, Amelia C Crampin, Kathleen Kahn, Samuel J Clark
In regions without complete-coverage civil registration and vital statistics systems there is uncertainty about even the most basic demographic indicators. In such regions the majority of deaths occur outside hospitals and are not recorded. Worldwide, fewer than one-third of deaths are assigned a cause, with the least information available from the most impoverished nations. In populations like this, verbal autopsy (VA) is a commonly used tool to assess cause of death and estimate cause-specific mortality rates and the distribution of deaths by cause...
2016: Journal of the American Statistical Association
Chiung-Yu Huang, Jing Qin, Huei-Ting Tsai
With the rapidly increasing availability of data in the public domain, combining information from different sources to infer about associations or differences of interest has become an emerging challenge to researchers. This paper presents a novel approach to improve efficiency in estimating the survival time distribution by synthesizing information from the individual-level data with t-year survival probabilities from external sources such as disease registries. While disease registries provide accurate and reliable overall survival statistics for the disease population, critical pieces of information that influence both choice of treatment and clinical outcomes usually are not available in the registry database...
2016: Journal of the American Statistical Association
Bo Zhou, David E Moorman, Sam Behseta, Hernando Ombao, Babak Shahbaba
The goal of this paper is to develop a novel statistical model for studying cross-neuronal spike train interactions during decision making. For an individual to successfully complete the task of decision-making, a number of temporally-organized events must occur: stimuli must be detected, potential outcomes must be evaluated, behaviors must be executed or inhibited, and outcomes (such as reward or no-reward) must be experienced. Due to the complexity of this process, it is likely the case that decision-making is encoded by the temporally-precise interactions between large populations of neurons...
2016: Journal of the American Statistical Association
Chirag J Patel, Francesca Dominici
No abstract text is available yet for this article.
2016: Journal of the American Statistical Association
Lisa M Pham, Luis Carvalho, Scott Schaus, Eric D Kolaczyk
Cellular response to a perturbation is the result of a dynamic system of biological variables linked in a complex network. A major challenge in drug and disease studies is identifying the key factors of a biological network that are essential in determining the cell's fate. Here our goal is the identification of perturbed pathways from high-throughput gene expression data. We develop a three-level hierarchical model, where (i) the first level captures the relationship between gene expression and biological pathways using confirmatory factor analysis, (ii) the second level models the behavior within an underlying network of pathways induced by an unknown perturbation using a conditional autoregressive model, and (iii) the third level is a spike-and-slab prior on the perturbations...
2016: Journal of the American Statistical Association
Aaron Fisher, Brian Caffo, Brian Schwartz, Vadim Zipunnikov
Many have suggested a bootstrap procedure for estimating the sampling variability of principal component analysis (PCA) results. However, when the number of measurements per subject (p) is much larger than the number of subjects (n), calculating and storing the leading principal components from each bootstrap sample can be computationally infeasible. To address this, we outline methods for fast, exact calculation of bootstrap principal components, eigenvalues, and scores. Our methods leverage the fact that all bootstrap samples occupy the same n-dimensional subspace as the original sample...
2016: Journal of the American Statistical Association
Zhiguang Huo, Ying Ding, Silvia Liu, Steffi Oesterreich, George Tseng
Disease phenotyping by omics data has become a popular approach that potentially can lead to better personalized treatment. Identifying disease subtypes via unsupervised machine learning is the first step towards this goal. In this paper, we extend a sparse K-means method towards a meta-analytic framework to identify novel disease subtypes when expression profiles of multiple cohorts are available. The lasso regularization and meta-analysis identify a unique set of gene features for subtype characterization...
2016: Journal of the American Statistical Association
(heart or cardiac or cardio*) AND arrest -"American Heart Association"