Statistics in Biosciences

Soyeon Kim, Veerabhadran Baladandayuthapani, J Jack Lee
In personalized medicine, biomarkers are used to select therapies with the highest likelihood of success based on an individual patient's biomarker/genomic profile. Two goals are to choose important biomarkers that accurately predict treatment outcomes and to cull unimportant biomarkers to reduce the cost of biological and clinical verifications. These goals are challenging due to the high dimensionality of genomic data. Variable selection methods based on penalized regression (e.g., the lasso and elastic net) have yielded promising results...
June 2017: Statistics in Biosciences
Guangren Yang, Yanqing Sun, Li Qi, Peter B Gilbert
An objective of preventive HIV vaccine efficacy trials is to understand how vaccine-induced immune responses to specific protein sequences of HIV-1 associate with subsequent infection with specific sequences of HIV, where the immune response biomarkers are measured in vaccine recipients via a two-phase sampling design. Motivated by this objective, we investigate the stratified mark-specific proportional hazards model under two-phase biomarker sampling, where the mark is the genetic distance of an infecting HIV-1 sequence to an HIV-1 sequence represented inside the vaccine...
June 2017: Statistics in Biosciences
Wei Vivian Li, Yiling Chen, Jingyi Jessica Li
Comparative transcriptomics has gained increasing popularity in genomic research thanks to the development of high-throughput technologies including microarray and next-generation RNA sequencing that have generated numerous transcriptomic data. An important question is to understand the conservation and divergence of biological processes in different species. We propose a testing-based method TROM (Transcriptome Overlap Measure) for comparing transcriptomes within or between different species, and provide a different perspective, in contrast to traditional correlation analyses, about capturing transcriptomic similarity...
June 2017: Statistics in Biosciences
Sunyoung Shin, Sündüz Keleş
Although genome-wide association studies (GWAS) have been successful at finding thousands of disease-associated genetic variants (GVs), identifying causal variants and elucidating the mechanisms by which genotypes influence phenotypes are critical open questions. A key challenge is that a large percentage of disease-associated GVs are potential regulatory variants located in noncoding regions, making them difficult to interpret. Recent research efforts focus on going beyond annotating GVs by integrating functional annotation data with GWAS to prioritize GVs...
June 2017: Statistics in Biosciences
Luis Alexander Crouch, Cheng Zheng, Ying Qing Chen
For randomized clinical trials where the endpoint of interest is a time-to-event subject to censoring, estimating the treatment effect has mostly focused on the hazard ratio from the Cox proportional hazards model. Since the model's proportional hazards assumption is not always satisfied, a useful alternative, the so-called additive hazards model, may instead be used to estimate a treatment effect on the difference of hazard functions. Still, the hazards difference may be difficult to grasp intuitively, particularly in a clinical setting of, e...
June 2017: Statistics in Biosciences
Peizhou Liao, Hao Wu, Tianwei Yu
The receiver operating characteristic (ROC) curve is an important tool for the evaluation and comparison of predictive models when the outcome is binary. If the class membership of the outcomes are known, ROC can be constructed for a model, and the ROC with greater area under the curve (AUC) indicates better performance. However in practice, imperfect reference standards often exist, in which class membership of every data point are not fully determined. This situation is especially prevalent in high-throughput biomedical data because obtaining perfect reference standards for all data points is either too costly or technically impractical...
June 2017: Statistics in Biosciences
John D Rice, Jeremy M G Taylor
One common use of binary response regression methods is classification based on an arbitrary probability threshold dictated by the particular application. Since this is given to us a priori, it is sensible to incorporate the threshold into our estimation procedure. Specifically, for the linear logistic model, we solve a set of locally weighted score equations, using a kernel-like weight function centered at the threshold. The bandwidth for the weight function is selected by cross validation of a novel hybrid loss function that combines classification error and a continuous measure of divergence between observed and fitted values; other possible cross-validation functions based on more common binary classification metrics are also examined...
October 2016: Statistics in Biosciences
Zhaohui Qin, Ben Li, Karen N Conneely, Hao Wu, Ming Hu, Deepak Ayyala, Yongseok Park, Victor X Jin, Fangyuan Zhang, Han Zhang, Li Li, Shili Lin
With the rapid development of high throughput technologies such as array and next generation sequencing (NGS), genome-wide, nucleotide-resolution epigenomic data are increasingly available. In recent years, there has been particular interest in data on DNA methylation and 3-dimensional (3D) chromosomal organization, which are believed to hold keys to understand biological mechanisms, such as transcription regulation, that are closely linked to human health and diseases. However, small sample size, complicated correlation structure, substantial noise, biases, and uncertainties, all present difficulties for performing statistical inference...
October 2016: Statistics in Biosciences
Loni Philip Tabb, Eric J Tchetgen Tchetgen, Greg A Wellenius, Brent A Coull
Count data often exhibit more zeros than predicted by common count distributions like the Poisson or negative binomial. In recent years, there has been considerable interest in methods for analyzing zero-inflated count data in longitudinal or other correlated data settings. A common approach has been to extend zero-inflated Poisson models to include random effects that account for correlation among observations. However, these models have been shown to have a few drawbacks, including interpretability of regression coefficients and numerical instability of fitting algorithms even when the data arise from the assumed model...
October 2016: Statistics in Biosciences
Daowen Zhang, Jie Lena Sun, Karen Pieper
Linear mixed effects models are widely used to analyze a clustered response variable. Motivated by a recent study to examine and compare the hospital length of stay (LOS) between patients undertaking percutaneous coronary intervention (PCI) and coronary artery bypass graft (CABG) from several international clinical trials, we proposed a bivariate linear mixed effects model for the joint modeling of clustered PCI and CABG LOS's where each clinical trial is considered a cluster. Due to the large number of patients in some trials, commonly used commercial statistical software for fitting (bivariate) linear mixed models failed to run since it could not allocate enough memory to invert large dimensional matrices during the optimization process...
October 2016: Statistics in Biosciences
Jesse D Raffa, Elizabeth A Thompson
Correlation between study units in quantitative genetics studies often makes it difficult to compare important inferential aspects of studies. Describing the relatedness between study units is critical to capture features of pedigree studies involving heritability, including power and precision of heritability estimates. Blangero et al (2012) showed that in pedigree studies the power to detect heritability is a function of the true heritability and the eigenvalues of the kinship matrix. We extend this to a more general setting which allows statements about expected precision of heritability estimates...
October 2016: Statistics in Biosciences
Yanxun Xu, Lorenzo Trippa, Peter Müller, Yuan Ji
Targeted therapies based on biomarker profiling are becoming a mainstream direction of cancer research and treatment. Depending on the expression of specific prognostic biomarkers, targeted therapies assign different cancer drugs to subgroups of patients even if they are diagnosed with the same type of cancer by traditional means, such as tumor location. For example, Herceptin is only indicated for the subgroup of patients with HER2+ breast cancer, but not other types of breast cancer. However, subgroups like HER2+ breast cancer with effective targeted therapies are rare and most cancer drugs are still being applied to large patient populations that include many patients who might not respond or benefit...
June 2016: Statistics in Biosciences
Xuemin Gu, Nan Chen, Caimiao Wei, Suyu Liu, Vassiliki A Papadimitrakopoulou, Roy S Herbst, J Jack Lee
We propose a Bayesian two-stage biomarker-based adaptive randomization (AR) design for the development of targeted agents. The design has three main goals: (1) to test the treatment efficacy, (2) to identify prognostic and predictive markers for the targeted agents, and (3) to provide better treatment for patients enrolled in the trial. To treat patients better, both stages are guided by the Bayesian AR based on the individual patient's biomarker profiles. The AR in the first stage is based on a known marker...
June 2016: Statistics in Biosciences
Jared C Foster, Bin Nan, Lei Shen, Niko Kaciroti, Jeremy M G Taylor
We consider the problem of using permutation-based methods to test for treatment-covariate interactions from randomized clinical trial data. Testing for interactions is common in the field of personalized medicine, as subgroups with enhanced treatment effects arise when treatment-by-covariate interactions exist. Asymptotic tests can often be performed for simple models, but in many cases, more complex methods are used to identify subgroups, and non-standard test statistics proposed, and asymptotic results may be difficult to obtain...
June 2016: Statistics in Biosciences
Gilbert S Omenn
Omics-based technology platforms have made new kinds of cancer profiling tests feasible. There are several valuable examples in clinical practice, and many more under development. A concerted, transparent process of discovery with lock-down of candidate assays and classifiers and clear specification of intended clinical use is essential. The Institute of Medicine has now proposed a three-stage scheme of confirming and validating analytical findings, validating performance on clinical specimens, and demonstrating explicit clinical utility for an approvable test (Micheel et al...
June 2016: Statistics in Biosciences
Guogen Shan, Hua Zhang, Tao Jiang, Hanna Peterson, Daniel Young, Changxing Ma
In a one-sided hypothesis testing problem in clinical trials, the monotonic condition of a tail probability function is fundamentally important to guarantee that the actual type I and II error rates occur at the boundary of their associated parameter spaces. Otherwise, one has to search for the actual rates over the complete parameter space, which could be very computationally intensive. This important property has been extensively studied in traditional one-stage study settings (e.g., non-inferiority or superiority between two binomial proportions), but there is very limited research for this property in a two-stage design setting, e...
2016: Statistics in Biosciences
Qiongshi Lu, Chentian Jin, Jiehuan Sun, Russell Bowler, Katerina Kechris, Naftali Kaminski, Hongyu Zhao
Rich collections of genomic and epigenomic annotations, availabilities of large population cohorts for genome-wide association studies (GWAS), and advancements in data integration techniques provide the unprecedented opportunity to accelerate discoveries in complex disease studies through integrative analyses. In this paper, we apply a variety of approaches to integrate GWAS summary statistics of chronic obstructive pulmonary disease (COPD) with functional annotations to illustrate how data integration could help researchers understand complex human diseases...
2016: Statistics in Biosciences
Aidan G O'Keeffe, Daniel M Farewell, Brian D M Tom, Vernon T Farewell
In longitudinal randomised trials and observational studies within a medical context, a composite outcome-which is a function of several individual patient-specific outcomes-may be felt to best represent the outcome of interest. As in other contexts, missing data on patient outcome, due to patient drop-out or for other reasons, may pose a problem. Multiple imputation is a widely used method for handling missing data, but its use for composite outcomes has been seldom discussed. Whilst standard multiple imputation methodology can be used directly for the composite outcome, the distribution of a composite outcome may be of a complicated form and perhaps not amenable to statistical modelling...
2016: Statistics in Biosciences
Ying Huang, Eric Laber
For a patient who is facing a treatment decision, the added value of information provided by a biomarker depends on the individual patient's expected response to treatment with and without the biomarker, as well as his/her tolerance of disease and treatment harm. However, individualized estimators of the value of a biomarker are lacking. We propose a new graphical tool named the subject-specific expected benefit curve for quantifying the personalized value of a biomarker in aiding a treatment decision. We develop semiparametric estimators for two general settings: (i) when biomarker data are available from a randomized trial; and (ii) when biomarker data are available from a cohort or a cross-sectional study, together with external information about a multiplicative treatment effect...
2016: Statistics in Biosciences
Zifang Guo, Wenbin Lu, Lexin Li
Despite enormous development on variable selection approaches in recent years, modeling and selection of high dimensional censored regression remains a challenging question. When the number of predictors p far exceeds the number of observational units n and the outcome is censored, computations of existing solutions often become difficult, or even infeasible in some situations, while performances frequently deteriorate. In this article, we aim at simultaneous model estimation and variable selection for Cox proportional hazards models with high dimensional covariates...
October 1, 2015: Statistics in Biosciences
