Read by QxMD icon Read

Journal of Computational and Graphical Statistics

Gertraud Malsiner-Walli, Sylvia Frühwirth-Schnatter, Bettina Grün
The use of a finite mixture of normal distributions in model-based clustering allows us to capture non-Gaussian data clusters. However, identifying the clusters from the normal components is challenging and in general either achieved by imposing constraints on the model or by using post-processing procedures. Within the Bayesian framework, we propose a different approach based on sparse finite mixtures to achieve identifiability. We specify a hierarchical prior, where the hyperparameters are carefully selected such that they are reflective of the cluster structure aimed at...
April 3, 2017: Journal of Computational and Graphical Statistics
Yiwen Zhang, Hua Zhou, Jin Zhou, Wei Sun
Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts...
2017: Journal of Computational and Graphical Statistics
John A Kamm, Jonathan Terhorst, Yun S Song
A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on...
2017: Journal of Computational and Graphical Statistics
Danjie Zhang, Ming-Hui Chen, Joseph G Ibrahim, Mark E Boye, Wei Shen
Joint models for longitudinal and survival data are routinely used in clinical trials or other studies to assess a treatment effect while accounting for longitudinal measures such as patient-reported outcomes (PROs). In the Bayesian framework, the deviance information criterion (DIC) and the logarithm of the pseudo marginal likelihood (LPML) are two well-known Bayesian criteria for comparing joint models. However, these criteria do not provide separate assessments of each component of the joint model. In this paper, we develop a novel decomposition of DIC and LPML to assess the fit of the longitudinal and survival components of the joint model, separately...
2017: Journal of Computational and Graphical Statistics
J T Gaskins, M J Daniels
The estimation of the covariance matrix is a key concern in the analysis of longitudinal data. When data consists of multiple groups, it is often assumed the covariance matrices are either equal across groups or are completely distinct. We seek methodology to allow borrowing of strength across potentially similar groups to improve estimation. To that end, we introduce a covariance partition prior which proposes a partition of the groups at each measurement time. Groups in the same set of the partition share dependence parameters for the distribution of the current measurement given the preceding ones, and the sequence of partitions is modeled as a Markov chain to encourage similar structure at nearby measurement times...
January 2, 2016: Journal of Computational and Graphical Statistics
Chiranjit Mukherjee, Abel Rodriguez
Gaussian graphical models are popular for modeling high-dimensional multivariate data with sparse conditional dependencies. A mixture of Gaussian graphical models extends this model to the more realistic scenario where observations come from a heterogenous population composed of a small number of homogeneous sub-groups. In this paper we present a novel stochastic search algorithm for finding the posterior mode of high-dimensional Dirichlet process mixtures of decomposable Gaussian graphical models. Further, we investigate how to harness the massive thread-parallelization capabilities of graphical processing units to accelerate computation...
2016: Journal of Computational and Graphical Statistics
Ollivier Hyrien, Andrea Baran
Mean-shift is an iterative procedure often used as a nonparametric clustering algorithm that defines clusters based on the modal regions of a density function. The algorithm is conceptually appealing and makes assumptions neither about the shape of the clusters nor about their number. However, with a complexity of O(n(2)) per iteration, it does not scale well to large data sets. We propose a novel algorithm which performs density-based clustering much quicker than mean-shift, yet delivering virtually identical results...
2016: Journal of Computational and Graphical Statistics
Asad Haris, Daniela Witten, Noah Simon
We consider the task of fitting a regression model involving interactions among a potentially large set of covariates, in which we wish to enforce strong heredity. We propose FAMILY, a very general framework for this task. Our proposal is a generalization of several existing methods, such as VANISH [Radchenko and James, 2010], hierNet [Bien et al., 2013], the all-pairs lasso, and the lasso using only main effects. It can be formulated as the solution to a convex optimization problem, which we solve using an efficient alternating directions method of multipliers (ADMM) algorithm...
2016: Journal of Computational and Graphical Statistics
Ashley Petersen, Daniela Witten, Noah Simon
We consider the problem of predicting an outcome variable using p covariates that are measured on n independent observations, in a setting in which additive, flexible, and interpretable fits are desired. We propose the fused lasso additive model (FLAM), in which each additive function is estimated to be piecewise constant with a small number of adaptively-chosen knots. FLAM is the solution to a convex optimization problem, for which a simple algorithm with guaranteed convergence to a global optimum is provided...
2016: Journal of Computational and Graphical Statistics
Tuo Zhao, Han Liu
We propose an accelerated path-following iterative shrinkage thresholding algorithm (APISTA) for solving high dimensional sparse nonconvex learning problems. The main difference between APISTA and the path-following iterative shrinkage thresholding algorithm (PISTA) is that APISTA exploits an additional coordinate descent subroutine to boost the computational performance. Such a modification, though simple, has profound impact: APISTA not only enjoys the same theoretical guarantee as that of PISTA, i.e., APISTA attains a linear rate of convergence to a unique sparse local optimum with good statistical properties, but also significantly outperforms PISTA in empirical benchmarks...
2016: Journal of Computational and Graphical Statistics
Daniel Taylor-Rodriguez, Andrew Womack, Nikolay Bliznyuk
This paper investigates Bayesian variable selection when there is a hierarchical dependence structure on the inclusion of predictors in the model. In particular, we study the type of dependence found in polynomial response surfaces of orders two and higher, whose model spaces are required to satisfy weak or strong heredity conditions. These conditions restrict the inclusion of higher-order terms depending upon the inclusion of lower-order parent terms. We develop classes of priors on the model space, investigate their theoretical and finite sample properties, and provide a Metropolis-Hastings algorithm for searching the space of models...
2016: Journal of Computational and Graphical Statistics
Shangbang Rao, Joseph G Ibrahim, Jian Cheng, Pew-Thian Yap, Hongtu Zhu
High angular resolution diffusion imaging (HARDI) has recently been of great interest in mapping the orientation of intra-voxel crossing fibers, and such orientation information allows one to infer the connectivity patterns prevalent among different brain regions and possible changes in such connectivity over time for various neurodegenerative and neuropsychiatric diseases. The aim of this paper is to propose a penalized multi-scale adaptive regression model (PMARM) framework to spatially and adaptively infer the orientation distribution function (ODF) of water diffusion in regions with complex fiber configurations...
2016: Journal of Computational and Graphical Statistics
Chong Zhang, Yufeng Liu, Junhui Wang, Hongtu Zhu
The Support Vector Machine (SVM) is a very popular classification tool with many successful applications. It was originally designed for binary problems with desirable theoretical properties. Although there exist various Multicategory SVM (MSVM) extensions in the literature, some challenges remain. In particular, most existing MSVMs make use of k classification functions for a k-class problem, and the corresponding optimization problems are typically handled by existing quadratic programming solvers. In this paper, we propose a new group of MSVMs, namely the Reinforced Angle-based MSVMs (RAMSVMs), using an angle-based prediction rule with k - 1 functions directly...
2016: Journal of Computational and Graphical Statistics
Lizhen Xu, Radu V Craiu, Lei Sun, Andrew D Paterson
Motivated by genetic association studies of pleiotropy, we propose a Bayesian latent variable approach to jointly study multiple outcomes. The models studied here can incorporate both continuous and binary responses, and can account for serial and cluster correlations. We consider Bayesian estimation for the model parameters, and we develop a novel MCMC algorithm that builds upon hierarchical centering and parameter expansion techniques to efficiently sample from the posterior distribution. We evaluate the proposed method via extensive simulations and demonstrate its utility with an application to aa association study of various complication outcomes related to type 1 diabetes...
2016: Journal of Computational and Graphical Statistics
Bruce D Bugbee, F Jay Breidt, Mark J van der Woerd
Variational approximations provide fast, deterministic alternatives to Markov Chain Monte Carlo for Bayesian inference on the parameters of complex, hierarchical models. Variational approximations are often limited in practicality in the absence of conjugate posterior distributions. Recent work has focused on the application of variational methods to models with only partial conjugacy, such as in semiparametric regression with heteroskedastic errors. Here, both the mean and log variance functions are modeled as smooth functions of covariates...
2016: Journal of Computational and Graphical Statistics
Leo L Duan, John P Clancy, Rhonda D Szczesniak
We propose a novel "tree-averaging" model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging...
2016: Journal of Computational and Graphical Statistics
Hui Jiang, John Chong Mu, Kun Yang, Chao Du, Luo Lu, Wing Hung Wong
Optional Pólya tree (OPT) is a flexible nonparametric Bayesian prior for density estimation. Despite its merits, the computation for OPT inference is challenging. In this paper we present time complexity analysis for OPT inference and propose two algorithmic improvements. The first improvement, named limited-lookahead optional Pólya tree (LL-OPT), aims at accelerating the computation for OPT inference. The second improvement modifies the output of OPT or LL-OPT and produces a continuous piecewise linear density estimate...
2016: Journal of Computational and Graphical Statistics
Hyonho Chun, Xianghua Zhang, Hongyu Zhao
Revealing biological networks is one key objective in systems biology. With microarrays, researchers now routinely measure expression profiles at the genome level under various conditions, and, such data may be utilized to statistically infer gene regulation networks. Gaussian graphical models (GGMs) have proven useful for this purpose by modeling the Markovian dependence among genes. However, a single GGM may not be adequate to describe the potentially differing networks across various conditions, and hence it is more natural to infer multiple GGMs from such data...
October 1, 2015: Journal of Computational and Graphical Statistics
John Hughes
Non-Gaussian spatial data are common in many fields. When fitting regressions for such data, one needs to account for spatial dependence to ensure reliable inference for the regression coefficients. The two most commonly used regression models for spatially aggregated data are the automodel and the areal generalized linear mixed model (GLMM). These models induce spatial dependence in different ways but share the smoothing approach, which is intuitive but problematic. This article develops a new regression model for areal data...
September 16, 2015: Journal of Computational and Graphical Statistics
Wei Xiao, Yichao Wu, Hua Zhou
The least angle regression (LAR) was proposed by Efron, Hastie, Johnstone and Tibshirani (2004) for continuous model selection in linear regression. It is motivated by a geometric argument and tracks a path along which the predictors enter successively and the active predictors always maintain the same absolute correlation (angle) with the residual vector. Although it gains popularity quickly, its extensions seem rare compared to the penalty methods. In this expository article, we show that the powerful geometric idea of LAR can be generalized in a fruitful way...
July 1, 2015: Journal of Computational and Graphical Statistics
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"