Abhishek Kaul, Ori Davidov, Shyamal D Peddada
SUMMARYThis paper is motivated by the recent interest in the analysis of high-dimensional microbiome data. A key feature of these data is the presence of "structural zeros" which are microbes missing from an observation vector due to an underlying biological process and not due to error in measurement. Typical notions of missingness are unable to model these structural zeros. We define a general framework which allows for structural zeros in the model and propose methods of estimating sparse high-dimensional covariance and precision matrices under this setup...
January 8, 2017: Biostatistics
Emily J Huang, Ethan X Fang, Daniel F Hanley, Michael Rosenblum
SUMMARYIn many randomized controlled trials, the primary analysis focuses on the average treatment effect and does not address whether treatment benefits are widespread or limited to a select few. This problem affects many disease areas, since it stems from how randomized trials, often the gold standard for evaluating treatments, are designed and analyzed. Our goal is to learn about the fraction who benefit from a new treatment using randomized trial data. We consider the case where the outcome is ordinal, with binary outcomes as a special case...
December 26, 2016: Biostatistics
Sebastian Meyer, Leonhard Held
SummaryRoutine public health surveillance of notifiable infectious diseases gives rise to weekly counts of reported cases-possibly stratified by region and/or age group. We investigate how an age-structured social contact matrix can be incorporated into a spatio-temporal endemic-epidemic model for infectious disease counts. To illustrate the approach, we analyze the spread of norovirus gastroenteritis over six age groups within the 12 districts of Berlin, 2011-2015, using contact data from the POLYMOD study...
December 26, 2016: Biostatistics
Duncan Lee, Sabyasachi Mukhopadhyay, Alastair Rushworth, Sujit K Sahu
SummaryIn the United Kingdom, air pollution is linked to around 40000 premature deaths each year, but estimating its health effects is challenging in a spatio-temporal study. The challenges include spatial misalignment between the pollution and disease data; uncertainty in the estimated pollution surface; and complex residual spatio-temporal autocorrelation in the disease data. This article develops a two-stage model that addresses these issues. The first stage is a spatio-temporal fusion model linking modeled and measured pollution data, while the second stage links these predictions to the disease data...
December 26, 2016: Biostatistics
Wei Fu, Jeffrey S Simonoff
SUMMARYTree methods (recursive partitioning) are a popular class of nonparametric methods for analyzing data. One extension of the basic tree methodology is the survival tree, which applies recursive partitioning to censored survival data. There are several existing survival tree methods in the literature, which are mainly designed for right-censored data. We propose two new survival trees for left-truncated and right-censored (LTRC) data, which can be seen as a generalization of the traditional survival tree for right-censored data...
December 26, 2016: Biostatistics
David Lenis, Cyrus F Ebnesajjad, Elizabeth A Stuart
SUMMARYOne of the main limitations of causal inference methods is that they rely on the assumption that all variables are measured without error. A popular approach for handling measurement error is simulation-extrapolation (SIMEX). However, its use for estimating causal effects have been examined only in the context of an additive, non-differential, and homoscedastic classical measurement error structure. In this article we extend the SIMEX methodology, in the context of a mean reverting measurement error structure, to a doubly robust estimator of the average treatment effect when a single covariate is measured with error but the outcome and treatment and treatment indicator are not...
December 19, 2016: Biostatistics
January 2017: Biostatistics
Marla Johnson, Elizabeth Purdom
Sequencing of messenger RNA (mRNA) can provide estimates of the levels of individual isoforms within the cell. It remains to adapt many standard statistical methods commonly used for analyzing gene expression levels to take advantage of this additional information. One novel question is whether we can find clusters of samples that are distinguished not by their gene expression but by their isoform usage. We propose a novel approach for clustering mRNA-Seq data that identifies such clusters. We show via simulation that our methods are more sensitive to finding clusters based on isoform usage than standard clustering techniques...
October 25, 2016: Biostatistics
F Towfic, R Kusko, B Zeskind
The article by Nygaard et al proposes that applying batch correction approaches to microarray data from studies with unbalanced designs may inadvertently exaggerate the differences observed. In seeking to illustrate their point, Nygaard et al. utilized a dataset (GSE61901) from a study we published (Towfic and others, 2014) and showed that one analysis pipeline utilizing the traditional approach to batch correction (ComBat) yielded over 1000 differentially expressed probesets, while an alternative approach proposed by Nygaard et al (utilizing batch as a fixed effect and averaging technical replicates) recovered 11 differentially expressed probesets...
October 25, 2016: Biostatistics
Matthew Stephens
SummaryWe introduce a new Empirical Bayes approach for large-scale hypothesis testing, including estimating false discovery rates (FDRs), and effect sizes. This approach has two key differences from existing approaches to FDR analysis. First, it assumes that the distribution of the actual (unobserved) effects is unimodal, with a mode at 0. This "unimodal assumption" (UA), although natural in many contexts, is not usually incorporated into standard FDR analysis, and we demonstrate how incorporating it brings many benefits...
October 17, 2016: Biostatistics
Eugen Pircalabelu, Gerda Claeskens, Lourens J Waldorp
SummaryWe have developed a method for estimating brain networks from fMRI datasets that have not all been measured using the same set of brain regions. Some of the coarse scale regions have been split in smaller subregions. The proposed penalized estimation procedure selects undirected graphical models with similar structures that combine information from several subjects and several coarseness scales. Both within-scale edges and between-scale edges that identify possible connections between a large region and its subregions are estimated...
October 2016: Biostatistics
Joseph Antonelli, Matthew Cefalu, Luke Bornn
SummaryIn environmental epidemiology, exposures are not always available at subject locations and must be predicted using monitoring data. The monitor locations are often outside the control of researchers, and previous studies have shown that "preferential sampling" of monitoring locations can adversely affect exposure prediction and subsequent health effect estimation. We adopt a slightly different definition of preferential sampling than is typically seen in the literature, which we call population-based preferential sampling...
October 2016: Biostatistics
Gianluca Frasso, Philippe Lambert
SummaryThe 2014 Ebola outbreak in Sierra Leone is analyzed using a susceptible-exposed-infectious-removed (SEIR) epidemic compartmental model. The discrete time-stochastic model for the epidemic evolution is coupled to a set of ordinary differential equations describing the dynamics of the expected proportions of subjects in each epidemic state. The unknown parameters are estimated in a Bayesian framework by combining data on the number of new (laboratory confirmed) Ebola cases reported by the Ministry of Health and prior distributions for the transition rates elicited using information collected by the WHO during the follow-up of specific Ebola cases...
October 2016: Biostatistics
Jonathan W Bartlett, Jeremy M G Taylor
Studies often follow individuals until they fail from one of a number of competing failure types. One approach to analyzing such competing risks data involves modeling the cause-specific hazards as functions of baseline covariates. A common issue that arises in this context is missing values in covariates. In this setting, we first establish conditions under which complete case analysis (CCA) is valid. We then consider application of multiple imputation to handle missing covariate values, and extend the recently proposed substantive model compatible version of fully conditional specification (SMC-FCS) imputation to the competing risks setting...
October 2016: Biostatistics
Marie-Karelle Riviere, Sebastian Ueckert, France Mentré
Non-linear mixed effect models (NLMEMs) are widely used for the analysis of longitudinal data. To design these studies, optimal design based on the expected Fisher information matrix (FIM) can be used instead of performing time-consuming clinical trial simulations. In recent years, estimation algorithms for NLMEMs have transitioned from linearization toward more exact higher-order methods. Optimal design, on the other hand, has mainly relied on first-order (FO) linearization to calculate the FIM. Although efficient in general, FO cannot be applied to complex non-linear models and with difficulty in studies with discrete data...
October 2016: Biostatistics
Brisa N Sánchez, Meihua Wu, Peter X K Song, Wen Wang
Advances in high throughput technology have accelerated the use of hundreds to millions of biomarkers to construct classifiers that partition patients into different clinical conditions. Prior to classifier development in actual studies, a critical need is to determine the sample size required to reach a specified classification precision. We develop a systematic approach for sample size determination in high-dimensional (large [Formula: see text] small [Formula: see text]) classification analysis. Our method utilizes the probability of correct classification (PCC) as the optimization objective function and incorporates the higher criticism thresholding procedure for classifier development...
October 2016: Biostatistics
Federico Ambrogi, Thomas H Scheike
High-dimensional regression has become an increasingly important topic for many research fields. For example, biomedical research generates an increasing amount of data to characterize patients' bio-profiles (e.g. from a genomic high-throughput assay). The increasing complexity in the characterization of patients' bio-profiles is added to the complexity related to the prolonged follow-up of patients with the registration of the occurrence of possible adverse events. This information may offer useful insight into disease dynamics and in identifying subset of patients with worse prognosis and better response to the therapy...
October 2016: Biostatistics
Jennifer A Sinnott, Tianxi Cai
When a moderate number of potential predictors are available and a survival model is fit with regularization to achieve variable selection, providing accurate inference on the predicted survival can be challenging. We investigate inference on the predicted survival estimated after fitting a Cox model under regularization guaranteeing the oracle property. We demonstrate that existing asymptotic formulas for the standard errors of the coefficients tend to underestimate the variability for some coefficients, while typical resampling such as the bootstrap tends to overestimate it; these approaches can both lead to inaccurate variance estimation for predicted survival functions...
October 2016: Biostatistics
Elisa Sheng, Daniela Witten, Xiao-Hua Zhou
In a multivariate setting, we consider the task of identifying features whose correlations with the other features differ across conditions. Such correlation shifts may occur independently of mean shifts, or differences in the means of the individual features across conditions. Previous approaches for detecting correlation shifts consider features simultaneously, by computing a correlation-based test statistic for each feature. However, since correlations involve two features, such approaches do not lend themselves to identifying which feature is the culprit...
October 2016: Biostatistics
Ziwen Tan, Guoyou Qin, Haibo Zhou
Outcome-dependent sampling (ODS) designs have been well recognized as a cost-effective way to enhance study efficiency in both statistical literature and biomedical and epidemiologic studies. A partially linear additive model (PLAM) is widely applied in real problems because it allows for a flexible specification of the dependence of the response on some covariates in a linear fashion and other covariates in a nonlinear non-parametric fashion. Motivated by an epidemiological study investigating the effect of prenatal polychlorinated biphenyls exposure on children's intelligence quotient (IQ) at age 7 years, we propose a PLAM in this article to investigate a more flexible non-parametric inference on the relationships among the response and covariates under the ODS scheme...
October 2016: Biostatistics
(heart or cardiac or cardio*) AND arrest -"American Heart Association"