Ernst C Wit, Luigi Augugliaro, Hassan Pazira, Javier González, Fentaw Abegaz
Clinical studies where patients are routinely screened for many genomic features are becoming more routine. In principle, this holds the promise of being able to find genomic signatures for a particular disease. In particular, cancer survival is thought to be closely linked to the genomic constitution of the tumor. Discovering such signatures will be useful in the diagnosis of the patient, may be used for treatment decisions and, perhaps, even the development of new treatments. However, genomic data are typically noisy and high-dimensional, not rarely outstripping the number of patients included in the study...
October 30, 2018: Biostatistics
William J Artman, Inbal Nahum-Shani, Tianshuang Wu, James R Mckay, Ashkan Ertefaie
Sequential, multiple assignment, randomized trial (SMART) designs have become increasingly popular in the field of precision medicine by providing a means for comparing more than two sequences of treatments tailored to the individual patient, i.e., dynamic treatment regime (DTR). The construction of evidence-based DTRs promises a replacement to ad hoc one-size-fits-all decisions pervasive in patient care. However, there are substantial statistical challenges in sizing SMART designs due to the correlation structure between the DTRs embedded in the design (EDTR)...
October 30, 2018: Biostatistics
Daniel Nevo, Tsuyoshi Hamada, Shuji Ogino, Molin Wang
The goals in clinical and cohort studies often include evaluation of the association of a time-dependent binary treatment or exposure with a survival outcome. Recently, several impactful studies targeted the association between initiation of aspirin and survival following colorectal cancer (CRC) diagnosis. The value of this exposure is zero at baseline and may change its value to one at some time point. Estimating this association is complicated by having only intermittent measurements on aspirin-taking. Commonly used methods can lead to substantial bias...
October 30, 2018: Biostatistics
Peijie Hou, Joshua M Tebbs, Dewei Wang, Christopher S McMahan, Christopher R Bilder
Group testing involves pooling individual specimens (e.g., blood, urine, swabs, etc.) and testing the pools for the presence of disease. When the proportion of diseased individuals is small, group testing can greatly reduce the number of tests needed to screen a population. Statistical research in group testing has traditionally focused on applications for a single disease. However, blood service organizations and large-scale disease surveillance programs are increasingly moving towards the use of multiplex assays, which measure multiple disease biomarkers at once...
October 26, 2018: Biostatistics
Lin Zhang, Dipankar Bandyopadhyay
Epidemiological studies on periodontal disease (PD) collect relevant bio-markers, such as the clinical attachment level (CAL) and the probed pocket depth (PPD), at pre-specified tooth sites clustered within a subject's mouth, along with various other demographic and biological risk factors. Routine cross-sectional evaluation are conducted under a linear mixed model (LMM) framework with underlying normality assumptions on the random terms. However, a careful investigation reveals considerable non-normality manifested in those random terms, in the form of skewness and tail behavior...
October 26, 2018: Biostatistics
Ling-Wan Chen, Idil Yavuz, Yu Cheng, Abdus S Wahed
Recently dynamic treatment regimens (DTRs) have drawn considerable attention, as an effective tool for personalizing medicine. Sequential Multiple Assignment Randomized Trials (SMARTs) are often used to gather data for making inference on DTRs. In this article, we focus on regression analysis of DTRs from a two-stage SMART for competing risk outcomes based on cumulative incidence functions (CIFs). Even though there are extensive works on the regression problem for DTRs, no research has been done on modeling the CIF for SMART trials...
October 26, 2018: Biostatistics
Ian Barnett, Jukka-Pekka Onnela
With increasing availability of smartphones with Global Positioning System (GPS) capabilities, large-scale studies relating individual-level mobility patterns to a wide variety of patient-centered outcomes, from mood disorders to surgical recovery, are becoming a reality. Similar past studies have been small in scale and have provided wearable GPS devices to subjects. These devices typically collect mobility traces continuously without significant gaps in the data, and consequently the problem of data missingness has been safely ignored...
October 26, 2018: Biostatistics
Theresa Stocks, Tom Britton, Michael Höhle
Despite the wide application of dynamic models in infectious disease epidemiology, the particular modeling of variability in the different model components is often subjective rather than the result of a thorough model selection process. This is in part because inference for a stochastic transmission model can be difficult since the likelihood is often intractable due to partial observability. In this work, we address the question of adequate inclusion of variability by demonstrating a systematic approach for model selection and parameter inference for dynamic epidemic models...
September 27, 2018: Biostatistics
Paul R Rosenbaum
In observational studies of treatment effects, it is common to have several outcomes, perhaps of uncertain quality and relevance, each purporting to measure the effect of the treatment. A single planned combination of several outcomes may increase both power and insensitivity to unmeasured bias when the plan is wisely chosen, but it may miss opportunities in other cases. A method is proposed that uses one planned combination with only a mild correction for multiple testing and exhaustive consideration of all possible combinations fully correcting for multiple testing...
September 26, 2018: Biostatistics
Ciprian M Crainiceanu, Adina Crainiceanu
The bootstrap, introduced in Efron (1979. Bootstrap methods: another look at the jackknife. The Annals of Statistics7, 1-26), is a landmark method for quantifying variability. It uses sampling with replacement with a sample size equal to that of the original data. We propose the upstrap, which samples with replacement either more or fewer samples than the original sample size. We illustrate the upstrap by solving a hard, but common, sample size calculation problem. The data and code used for the analysis in this article are available on GitHub (2018...
September 25, 2018: Biostatistics
Areti Boulieri, James E Bennett, Marta Blangiardo
Spatial monitoring of trends in health data plays an important part of public health surveillance. Most commonly, it is used to understand the etiology of a public health issue, to assess the impact of an intervention, or to provide detection of unusual behavior. In this article, we present a Bayesian mixture model for public health surveillance, which is able to provide estimates of the disease risk in space and time, and also to detect areas with unusual behavior. The model is designed to deal with a range of spatial and temporal patterns in the data, and with time series of different lengths...
September 25, 2018: Biostatistics
Philip S Boonstra, Ryan P Barbaro
This article considers Bayesian approaches for incorporating information from a historical model into a current analysis when the historical model includes only a subset of covariates currently of interest. The statistical challenge is 2-fold. First, the parameters in the nested historical model are not generally equal to their counterparts in the larger current model, neither in value nor interpretation. Second, because the historical information will not be equally informative for all parameters in the current analysis, additional regularization may be required beyond that provided by the historical information...
September 21, 2018: Biostatistics
Huichen Zhu, Gen Li, Eric F Lock
High-dimensional multi-source data are encountered in many fields. Despite recent developments on the integrative dimension reduction of such data, most existing methods cannot easily accommodate data of multiple types (e.g. binary or count-valued). Moreover, multi-source data often have block-wise missing structure, i.e. data in one or more sources may be completely unobserved for a sample. The heterogeneous data types and presence of block-wise missing data pose significant challenges to the integration of multi-source data and further statistical analyses...
September 21, 2018: Biostatistics
Bo Chen, Radu V Craiu, Lei Sun
X-chromosome is often excluded from the so called "whole-genome" association studies due to the differences it exhibits between males and females. One particular analytical challenge is the unknown status of X-inactivation, where one of the two X-chromosome variants in females may be randomly selected to be silenced. In the absence of biological evidence in favor of one specific model, we consider a Bayesian model averaging framework that offers a principled way to account for the inherent model uncertainty, providing model averaging-based posterior density intervals and Bayes factors...
September 21, 2018: Biostatistics
Erin E Gabriel, Dean A Follmann
Surrogate evaluation is a difficult problem that is made more so by the presence of interference. Our proposed procedure can allow for relatively easy evaluation of surrogates for indirect or spill-over clinical effects at the cluster level. Our definition of surrogacy is based on the causal-association paradigm (Joffe and Greene, 2009. Related causal frameworks for surrogate outcomes. Biometrics65, 530-538), under which surrogates are evaluated by the strength of the association between a causal treatment effect on the clinical outcome and a causal treatment effect on the candidate surrogate...
September 21, 2018: Biostatistics
Jules L Ellis, Jakub Pecanka, Jelle J Goeman
In this article, we introduce a novel procedure for improving power of multiple testing procedures (MTPs) of interval hypotheses. When testing interval hypotheses the null hypothesis $P$-values tend to be stochastically larger than standard uniform if the true parameter is in the interior of the null hypothesis. The new procedure starts with a set of $P$-values and discards those with values above a certain pre-selected threshold, while the rest are corrected (scaled-up) by the value of the threshold. Subsequently, a chosen family-wise error rate (FWER) or false discovery rate MTP is applied to the set of corrected $P$-values only...
September 21, 2018: Biostatistics
Shuo Chen, Yishi Xing, Jian Kang, Peter Kochunov, L Elliot Hong
Brain connectivity studies often refer to brain areas as graph nodes and connections between nodes as edges, and aim to identify neuropsychiatric phenotype-related connectivity patterns. When performing group-level brain connectivity alternation analyses, it is critical to model the dependence structure between multivariate connectivity edges to achieve accurate and efficient estimates of model parameters. However, specifying and estimating dependencies between connectivity edges presents formidable challenges because (i) the dimensionality of parameters in the covariance matrix is high (of the order of the fourth power of the number of nodes); (ii) the covariance between a pair of edges involves four nodes with spatial location information; and (iii) the dependence structure between edges can be related to unknown network topological structures...
September 10, 2018: Biostatistics
Shirin Golchi, Kristian Thorlund
Response adaptive randomized clinical trials have gained popularity due to their flexibility for adjusting design components, including arm allocation probabilities, at any point in the trial according to the intermediate results. In the Bayesian framework, allocation probabilities to different treatment arms are commonly defined as functionals of the posterior distributions of parameters of the outcome distribution for each treatment. In a non-conjugate model, however, repeated updates of the posterior distribution can be computationally intensive...
September 10, 2018: Biostatistics
Sarah Fletcher Mercaldo, Jeffrey D Blume
Missing data are a common problem for both the construction and implementation of a prediction algorithm. Pattern submodels (PS)-a set of submodels for every missing data pattern that are fit using only data from that pattern-are a computationally efficient remedy for handling missing data at both stages. Here, we show that PS (i) retain their predictive accuracy even when the missing data mechanism is not missing at random (MAR) and (ii) yield an algorithm that is the most predictive among all standard missing data strategies...
September 6, 2018: Biostatistics
Luigi Augugliaro, Antonino Abbruzzo, Veronica Vinciotti
Graphical lasso is one of the most used estimators for inferring genetic networks. Despite its diffusion, there are several fields in applied research where the limits of detection of modern measurement technologies make the use of this estimator theoretically unfounded, even when the assumption of a multivariate Gaussian distribution is satisfied. Typical examples are data generated by polymerase chain reactions and flow cytometer. The combination of censoring and high-dimensionality make inference of the underlying genetic networks from these data very challenging...
September 6, 2018: Biostatistics
