Statistics in Biosciences

Xu Shu, Douglas E Schaubel
In studies featuring a sequence of ordered events, gap times between successive events are often of interest. Despite the rich literature in this area, very few methods for comparing gap times have been developed. We propose methods for estimating a hazard ratio connecting the first and second gap times. Specifically, a two-stage procedure is developed based on estimating equations. At the first stage, a proportional hazards model is fitted for the first gap time. Weighted estimating equations are then solved at the second stage to estimate the hazard ratio between the first and second gap times...
December 2017: Statistics in Biosciences
Hristina Pashova, Michael LeBlanc, Charles Kooperberg
When considering low-dimensional gene-treatment or gene-environment interactions we might suspect groups of genes to interact with treatment or environment in a similar way. For example, genes associated with related biological processes might interact with an environmental factor or a clinical treatment in its effect on a phenotype correspondingly. We use the idea of a structured interaction model together with penalized regression to limit the model complexity in a model in which we believe the interactions might behave in a similar way...
December 2017: Statistics in Biosciences
Liang Li, Sheng Luo, Bo Hu, Tom Greene
In longitudinal studies, prognostic biomarkers are often measured longitudinally. It is of both scientific and clinical interest to predict the risk of clinical events, such as disease progression or death, using these longitudinal biomarkers as well as other time-dependent and time-independent information about the patient. The prediction is dynamic in the sense that it can be made at any time during the follow-up, adapting to the changing at-risk population and incorporating the most recent longitudinal data...
December 2017: Statistics in Biosciences
Wen Wang, Mathieu Bray, Peter X-K Song, John D Kalbfleisch
While there is a growing need for kidney transplants to treat end stage kidney disease, the supply of transplantable kidneys is in serious shortage. Kidney paired donation (KPD) programs serve as platforms for candidates with willing but incompatible donors to assess the possibility of exchanging donors, thus opening up new transplant opportunities for these candidates. In recent years, non-directed (or altruistic) donors (NDDs) have been incorporated into KPD programs beginning chains of transplants that benefit many candidates...
December 2017: Statistics in Biosciences
J Choi, S Ye, K H Eng, K Korthauer, W H Bradley, J S Rader, C Kendziorski
Despite improvements in operative management and therapies, overall survival rates in advanced ovarian cancer have remained largely unchanged over the past three decades. Although it is possible to identify high-risk patients following surgery, the knowledge does not provide information about the genomic aberrations conferring risk, or the implications for treatment. To address these challenges, we developed an integrative pathway-index model and applied it to messenger RNA expression from 458 patients with serous ovarian carcinoma from the Cancer Genome Atlas project...
June 2017: Statistics in Biosciences
Eric Z Chen, Frederic D Bushman, Hongzhe Li
The human microbiome, which includes the collective microbes residing in or on the human body, has a profound influence on the human health. DNA sequencing technology has made the large-scale human microbiome studies possible by using shotgun metagenomic sequencing. One important aspect of data analysis of such metagenomic data is to quantify the bacterial abundances based on the metagenomic sequencing data. Existing methods almost always quantify such abundances one sample at a time, which ignore certain systematic differences in read coverage along the genomes due to GC contents, copy number variation and the bacterial origin of replication...
June 2017: Statistics in Biosciences
Ben Li, Yunxiao Li, Zhaohui S Qin
Modern high-throughput biotechnologies such as microarray and next generation sequencing produce a massive amount of information for each sample assayed. However, in a typical high-throughput experiment, only limited amount of data are observed for each individual feature, thus the classical 'large p, small n' problem. Bayesian hierarchical model, capable of borrowing strength across features within the same dataset, has been recognized as an effective tool in analyzing such data. However, the shrinkage effect, the most prominent feature of hierarchical features, can lead to undesirable over-correction for some features...
June 2017: Statistics in Biosciences
Soyeon Kim, Veerabhadran Baladandayuthapani, J Jack Lee
In personalized medicine, biomarkers are used to select therapies with the highest likelihood of success based on an individual patient's biomarker/genomic profile. Two goals are to choose important biomarkers that accurately predict treatment outcomes and to cull unimportant biomarkers to reduce the cost of biological and clinical verifications. These goals are challenging due to the high dimensionality of genomic data. Variable selection methods based on penalized regression (e.g., the lasso and elastic net) have yielded promising results...
June 2017: Statistics in Biosciences
Guangren Yang, Yanqing Sun, Li Qi, Peter B Gilbert
An objective of preventive HIV vaccine efficacy trials is to understand how vaccine-induced immune responses to specific protein sequences of HIV-1 associate with subsequent infection with specific sequences of HIV, where the immune response biomarkers are measured in vaccine recipients via a two-phase sampling design. Motivated by this objective, we investigate the stratified mark-specific proportional hazards model under two-phase biomarker sampling, where the mark is the genetic distance of an infecting HIV-1 sequence to an HIV-1 sequence represented inside the vaccine...
June 2017: Statistics in Biosciences
Wei Vivian Li, Yiling Chen, Jingyi Jessica Li
Comparative transcriptomics has gained increasing popularity in genomic research thanks to the development of high-throughput technologies including microarray and next-generation RNA sequencing that have generated numerous transcriptomic data. An important question is to understand the conservation and divergence of biological processes in different species. We propose a testing-based method TROM (Transcriptome Overlap Measure) for comparing transcriptomes within or between different species, and provide a different perspective, in contrast to traditional correlation analyses, about capturing transcriptomic similarity...
June 2017: Statistics in Biosciences
Sunyoung Shin, Sündüz Keleş
Although genome-wide association studies (GWAS) have been successful at finding thousands of disease-associated genetic variants (GVs), identifying causal variants and elucidating the mechanisms by which genotypes influence phenotypes are critical open questions. A key challenge is that a large percentage of disease-associated GVs are potential regulatory variants located in noncoding regions, making them difficult to interpret. Recent research efforts focus on going beyond annotating GVs by integrating functional annotation data with GWAS to prioritize GVs...
June 2017: Statistics in Biosciences
Luis Alexander Crouch, Cheng Zheng, Ying Qing Chen
For randomized clinical trials where the endpoint of interest is a time-to-event subject to censoring, estimating the treatment effect has mostly focused on the hazard ratio from the Cox proportional hazards model. Since the model's proportional hazards assumption is not always satisfied, a useful alternative, the so-called additive hazards model, may instead be used to estimate a treatment effect on the difference of hazard functions. Still, the hazards difference may be difficult to grasp intuitively, particularly in a clinical setting of, e...
June 2017: Statistics in Biosciences
Peizhou Liao, Hao Wu, Tianwei Yu
The receiver operating characteristic (ROC) curve is an important tool for the evaluation and comparison of predictive models when the outcome is binary. If the class membership of the outcomes are known, ROC can be constructed for a model, and the ROC with greater area under the curve (AUC) indicates better performance. However in practice, imperfect reference standards often exist, in which class membership of every data point are not fully determined. This situation is especially prevalent in high-throughput biomedical data because obtaining perfect reference standards for all data points is either too costly or technically impractical...
June 2017: Statistics in Biosciences
Tailiang Xie, Zhuoxin Yu
N-of-1 trial is a type of clinical trial which has been applied in chronic recurrent conditions that require long-term non-curative treatment. In this type of trials, each patient will be randomly assigned to one of the treatment sequences and repeatedly crossed over two or more treatments of interests. Through this cross-comparing method (cross-over phase), investigator can identify an optimal treatment (medicine or therapy) for the patient and treat the patient with the optimal treatment in an extension phase...
2017: Statistics in Biosciences
Mark A Wolters, C B Dean
Remote sensing images from Earth-orbiting satellites are a potentially rich data source for monitoring and cataloguing atmospheric health hazards that cover large geographic regions. A method is proposed for classifying such images into hazard and nonhazard regions using the autologistic regression model, which may be viewed as a spatial extension of logistic regression. The method includes a novel and simple approach to parameter estimation that makes it well suited to handling the large and high-dimensional datasets arising from satellite-borne instruments...
2017: Statistics in Biosciences
Yi Liu, Gavin Shaddick, James V Zidek
Performing studies on the risks of environmental hazards on human health requires accurate estimates of exposures that might be experienced by the populations at risk. Often there will be missing data and in many epidemiological studies, the locations and times of exposure measurements and health data do not match. To a large extent this will be due to the health and exposure data having arisen from completely different data sources and not as the result of a carefully designed study, leading to problems of both 'change of support' and 'misaligned data'...
2017: Statistics in Biosciences
Hein Putter, Hans C van Houwelingen
Time-dependent Cox regression and landmarking are the two most commonly used approaches for the analysis of time-dependent covariates in time-to-event data. The estimated effect of the time-dependent covariate in a landmarking analysis is based on the value of the time-dependent covariate at the landmark time point, after which the time-dependent covariate may change value. In this note we derive expressions for the (time-varying) regression coefficient of the time-dependent covariate in the landmark analysis, in terms of the regression coefficient and baseline hazard of the time-dependent Cox regression...
2017: Statistics in Biosciences
Debashis Ghosh
There is tremendous scientific and medical interest in the use of biomarkers to better facilitate medical decision making. In this article, we present a simple framework for assessing the predictive ability of a biomarker. The methodology requires use of techniques from a subfield of survival analysis termed semicompeting risks; results are presented to make the article self-contained. As we show in the article, one natural interpretation of semicompeting risks model is in terms of modifying the classical risk set approach to survival analysis that is more germane to medical decision making...
October 2016: Statistics in Biosciences
John D Rice, Jeremy M G Taylor
One common use of binary response regression methods is classification based on an arbitrary probability threshold dictated by the particular application. Since this is given to us a priori, it is sensible to incorporate the threshold into our estimation procedure. Specifically, for the linear logistic model, we solve a set of locally weighted score equations, using a kernel-like weight function centered at the threshold. The bandwidth for the weight function is selected by cross validation of a novel hybrid loss function that combines classification error and a continuous measure of divergence between observed and fitted values; other possible cross-validation functions based on more common binary classification metrics are also examined...
October 2016: Statistics in Biosciences
Zhaohui Qin, Ben Li, Karen N Conneely, Hao Wu, Ming Hu, Deepak Ayyala, Yongseok Park, Victor X Jin, Fangyuan Zhang, Han Zhang, Li Li, Shili Lin
With the rapid development of high throughput technologies such as array and next generation sequencing (NGS), genome-wide, nucleotide-resolution epigenomic data are increasingly available. In recent years, there has been particular interest in data on DNA methylation and 3-dimensional (3D) chromosomal organization, which are believed to hold keys to understand biological mechanisms, such as transcription regulation, that are closely linked to human health and diseases. However, small sample size, complicated correlation structure, substantial noise, biases, and uncertainties, all present difficulties for performing statistical inference...
October 2016: Statistics in Biosciences
