Statistics and its Interface

Esra Kürüm, John Hughes, Runze Li, Saul Shiffman
We propose a copula-based joint modeling framework for mixed longitudinal responses. Our approach permits all model parameters to vary with time, and thus will enable researchers to reveal dynamic response-predictor relationships and response-response associations. We call the new class of models TIMECOP because we model dependence using a time-varying copula. We develop a one-step estimation procedure for the TIMECOP parameter vector, and also describe how to estimate standard errors. We investigate the finite sample performance of our procedure via three simulation studies, one of which shows that our procedure performs well under ignorable missingness...
2018: Statistics and its Interface
Guanglei Yu, Liang Zhu, Jianguo Sun, Leslie L Robison
This paper discusses regression analysis of a type of incomplete mixed data arising from event history studies with the proportional rates model. By mixed data, we mean that each study subject may be observed continuously during the whole study period, continuously over some study periods and at some time points, or only at some discrete time points. Therefore, we have combined recurrent event and panel count data. For the problem, we present a multiple imputation-based estimation procedure and one advantage of the proposed marginal model approach is that it can be easily implemented...
2018: Statistics and its Interface
William L Leão, Carlos A Abanto-Valle, Ming-Hui Chen
A stochastic volatility-in-mean model with correlated errors using the generalized hyperbolic skew Student-t (GHST) distribution provides a robust alternative to the parameter estimation for daily stock returns in the absence of normality. An efficient Markov chain Monte Carlo (MCMC) sampling algorithm is developed for parameter estimation. The deviance information, the Bayesian predictive information and the log-predictive score criterion are used to assess the fit of the proposed model. The proposed method is applied to an analysis of the daily stock return data from the Standard & Poor's 500 index (S&P 500)...
2017: Statistics and its Interface
Christian E Galarza, Victor H Lachos, Dipankar Bandyopadhyay
This paper develops a likelihood-based approach to analyze quantile regression (QR) models for continuous longitudinal data via the asymmetric Laplace distribution (ALD). Compared to the conventional mean regression approach, QR can characterize the entire conditional distribution of the outcome variable and is more robust to the presence of outliers and misspecification of the error distribution. Exploiting the nice hierarchical representation of the ALD, our classical approach follows a Stochastic Approximation of the EM (SAEM) algorithm in deriving exact maximum likelihood estimates of the fixed-effects and variance components...
2017: Statistics and its Interface
Christopher Bryant, Hongtu Zhu, Mihye Ahn, Joseph Ibrahim
The aim of this article is to develop a Bayesian random graph mixture model (RGMM) to detect the latent class network (LCN) structure of brain connectivity networks and estimate the parameters governing this structure. The use of conjugate priors for unknown parameters leads to efficient estimation, and a well-known nonidentifiability issue is avoided by a particular parameterization of the stochastic block model (SBM). Posterior computation proceeds via an efficient Markov Chain Monte Carlo algorithm. Simulations demonstrate that LCN outperforms several other competing methods for community detection in weighted networks, and we apply our RGMM to estimate the latent community structures in the functional resting brain networks of 185 subjects from the ADHD-200 sample...
2017: Statistics and its Interface
Baolin Wu, James S Pankow
More and more large cohort studies have conducted or are conducting genome-wide association studies (GWAS) to reveal the genetic components of many complex human diseases. These large cohort studies often collected a broad array of correlated phenotypes that reflect common physiological processes. By jointly analyzing these correlated traits, we can gain more power by aggregating multiple weak effects and shed light on the mechanisms underlying complex human diseases. The majority of existing multi-trait association test methods are based on jointly modeling the multivariate traits conditional on the genotype as covariate, and can readily accommodate the imputed SNPs by using their imputed dosage as a covariate...
2017: Statistics and its Interface
Thaddeus Tarpey, Eva Petkova, Liangyu Zhu
Understanding heterogeneity in phenotypical characteristics, symptoms manifestations and response to treatment of subjects with psychiatric illnesses is a continuing challenge in mental health research. A long-standing goal of medical studies is to identify groups of subjects characterized with a particular trait or quality and to distinguish them from other subjects in a clinically relevant way. This paper develops and illustrates a novel approach to this problem based on a method of optimal-partitioning (clustering) of functional data...
July 1, 2016: Statistics and its Interface
Anastasia Ivanova, Allison M Deal
Many oncology phase II trials are single arm studies designed to screen novel treatments based on efficacy outcome. Efficacy is often assessed as an ordinal variable based on a level of response of solid tumors with four categories: complete response, partial response, stable disease and progression. We describe a two-stage design for a single-arm phase II trial where the primary objective is to test the rate of tumor response defined as complete plus partial response, and the secondary objective is to estimate the rate of disease control defined as tumor response plus stable disease...
2016: Statistics and its Interface
Yang Li, Yanqing Sun
Longitudinal data frequently arise in many fields such as medical follow-up studies focusing on specific longitudinal responses. In such situations, the responses are recorded only at discrete observation times. Most existing approaches for longitudinal data analysis assume that the observation or follow-up times are independent of the underlying response process, either completely or given some known covariates. We present a joint analysis approach in which possible correlations among the responses, observation and follow-up times can be characterized by time-dependent random effects...
2016: Statistics and its Interface
Kun Chen
Reduced-rank methods are very popular in high-dimensional multivariate analysis for conducting simultaneous dimension reduction and model estimation. However, the commonly-used reduced-rank methods are not robust, as the underlying reduced-rank structure can be easily distorted by only a few data outliers. Anomalies are bound to exist in big data problems, and in some applications they themselves could be of the primary interest. While naive residual analysis is often inadequate for outlier detection due to potential masking and swamping, robust reduced-rank estimation approaches could be computationally demanding...
2016: Statistics and its Interface
Chun Wang, Ming-Hui Chen, Elizabeth Schifano, Jing Wu, Jun Yan
Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data...
2016: Statistics and its Interface
Wan-Min Tsai, Heping Zhang, Eugenia Buta, Stephanie O'Malley, Ralitza Gueorguieva
The tree-based methodology has been widely applied to identify predictors of health outcomes in medical studies. However, the classical tree-based approaches do not pay particular attention to treatment assignment and thus do not consider prediction in the context of treatment received. In recent years, attention has been shifting from average treatment effects to identifying moderators of treatment response, and tree-based approaches to identify subgroups of subjects with enhanced treatment responses are emerging...
2016: Statistics and its Interface
Taoyun Cao, Xueqin Wang, Heping Zhang
This paper introduces Energy Bagging Tree (EBT) for multivariate nonparametric regression problems. The EBT makes use of a measure of dispersion constructed from a generalized Gini's mean difference as node impurity, and the tree split function therefore corresponds to the product of energy distance and descendants' proportions. As a non-parametric extension of the between-sample variation in the analysis of variance, this measure of dispersion serves well for EBT in understanding certain complex data. Extensive simulation studies indicate that EBT is highly competitive with existing regression tree methods...
2016: Statistics and its Interface
Jiwei Zhao, Heping Zhang
The need for analysis of multiple responses arises from many applications. In behavioral science, for example, comorbidity is a common phenomenon where multiple disorders occur in the same person. The advantage of jointly analyzing multiple correlated responses has been examined and documented. Due to the difficulties of modeling multiple responses, nonparametric tests such as generalized Kendall's Tau have been developed to assess the association between multiple responses and risk factors. These procedures have been applied to genomewide association studies of multiple complex traits...
2016: Statistics and its Interface
Qingrun Zhang, Chris Tyler-Smith, Quan Long
To identify evolutionary events from the footprints left in the patterns of genetic variation in a population, people use many statistical frameworks, including neutrality tests. In datasets from current high throughput sequencing and genotyping platforms, it is common to have missing data and low-confidence SNP calls at many segregating sites. However, the traditional statistical framework for neutrality tests does not allow for these possibilities; therefore the usual way of treating missing data is to ignore segregating sites with missing/low confidence calls, regardless of the good SNP calls at these sites in other individuals...
October 1, 2015: Statistics and its Interface
Le Bao, Adrian E Raftery, Amala Reddy
In most countries in the world outside of sub-Saharan Africa, HIV is largely concentrated in sub-populations whose behavior puts them at higher risk of contracting and transmitting HIV, such as people who inject drugs, sex workers and men who have sex with men. Estimating the size of these sub-populations is important for assessing overall HIV prevalence and designing effective interventions. We present a Bayesian hierarchical model for estimating the sizes of local and national HIV key affected populations...
April 1, 2015: Statistics and its Interface
Francesco C Stingo, Michael D Swartz, Marina Vannucci
Complex diseases, such as cancer, arise from complex etiologies consisting of multiple single-nucleotide polymorphisms (SNPs), each contributing a small amount to the overall risk of disease. Thus, many researchers have gone beyond single-SNPs analysis methods, focusing instead on groups of SNPs, for example by analysing haplotypes. More recently, pathway-based methods have been proposed that use prior biological knowledge on gene function to achieve a more powerful analysis of genome-wide association studies (GWAS) data...
2015: Statistics and its Interface
Yanming Di
We consider negative binomial (NB) regression models for RNA-Seq read counts and investigate an approach where such NB regression models are fitted to individual genes separately and, in particular, the NB dispersion parameter is estimated from each gene separately without assuming commonalities between genes. This single-gene approach contrasts with the more widely-used dispersion-modeling approach where the NB dispersion is modeled as a simple function of the mean or other measures of read abundance, and then estimated from a large number of genes combined...
2015: Statistics and its Interface
Hui Jiang, Julia Salzman
Ultra high-throughput sequencing of transcriptomes (RNA-Seq) has enabled the accurate estimation of gene expression at individual isoform level. However, systematic biases introduced during the sequencing and mapping processes as well as incompleteness of the transcript annotation databases may cause the estimates of isoform abundances to be unreliable, and in some cases, highly inaccurate. This paper introduces a penalized likelihood approach to detect and correct for such biases in a robust manner. Our model extends those previously proposed by introducing bias parameters for reads...
2015: Statistics and its Interface
Xianbin Zeng, Shuangge Ma, Yichen Qin, Yang Li
In this paper, we consider the variable selection problem in semiparametric additive partially linear models for longitudinal data. Our goal is to identify relevant main effects and corresponding interactions associated with the response variable. Meanwhile, we enforce the strong hierarchical restriction on the model, that is, an interaction can be included in the model only if both the associated main effects are included. Based on B-splines basis approximation for the nonparametric components, we propose an iterative estimation procedure for the model by penalizing the likelihood with a partial group minimax concave penalty (MCP), and use BIC to select the tuning parameter...
2015: Statistics and its Interface
