Stephanie T Chen, Luo Xiao, Ana-Maria Staicu
Functional data methods are often applied to longitudinal data as they provide a more flexible way to capture dependence across repeated observations. However, there is no formal testing procedure to determine if functional methods are actually necessary. We propose a goodness-of-fit test for comparing parametric covariance functions against general nonparametric alternatives for both irregularly observed longitudinal data and densely observed functional data. We consider a smoothing-based test statistic and approximate its null distribution using a bootstrap procedure...
November 19, 2018: Biometrics
Tieming Ji
Alterations in DNA methylation have been linked to the development and progression of many diseases. The bisulfite sequencing technique presents methylation profiles at base resolution. Count data on methylated and unmethylated reads provide information on the methylation level at each CpG site. As more bisulfite sequencing data become available, these data are increasingly needed to infer methylation aberrations in diseases. Automated and powerful algorithms also need to be developed to accurately identify differentially methylated regions between treatment groups...
November 15, 2018: Biometrics
Maïlis Amico, Ingrid Van Keilegom, Catherine Legrand
In survival analysis it often happens that a certain fraction of the subjects under study never experience the event of interest, i.e. they are considered 'cured'. In the presence of covariates, a common model for this type of data is the mixture cure model, which assumes that the population consists of two subpopulations, namely the cured and the non-cured ones, and it writes the survival function of the whole population given a set of covariates as a mixture of the survival function of the cured subjects (which equals one), and the survival function of the non-cured ones...
November 14, 2018: Biometrics
Ni Zhao, Haoyu Zhang, Jennifer J Clark, Arnab Maity, Michael C Wu
Most common human diseases are a result from the combined effect of genes, the environmental factors and their interactions such that including gene-environment (GE) interactions can improve power in gene mapping studies. The standard strategy is to test the SNPs, one-by-one, using a regression model that includes both the SNP effect and the GE interaction. However, the SNP-by-SNP approach has serious limitations, such as the inability to model epistatic SNP effects, biased estimation and reduced power. Thus, in this paper, we develop a kernel machine regression framework to model the overall genetic effect of a SNP-set, considering the possible GE interaction...
November 14, 2018: Biometrics
Minna Genbäck, Xavier de Luna
Causal inference with observational data can be performed under an assumption of no unobserved confounders (unconfoundedness assumption). There is, however, seldom clear subject-matter or empirical evidence for such an assumption. We therefore develop uncertainty intervals for average causal effects based on outcome regression estimators and doubly robust estimators, which provide inference taking into account both sampling variability and uncertainty due to unobserved confounders. In contrast with sampling variation, uncertainty due to unobserved confounding does not decrease with increasing sample size...
November 14, 2018: Biometrics
Cheolwoo Park, Hosik Choi, Chris Delcher, Yanning Wang, Young Joo Yoon
In recent years, there has been increased interest in symbolic data analysis, including for exploratory analysis, supervised and unsupervised learning, time series analysis, etc. Traditional statistical approaches that are designed to analyze single-valued data are not suitable because they cannot incorporate the additional information on data structure available in symbolic data, and thus new techniques have been proposed for symbolic data to bridge this gap. In this paper, we develop a regularized convex clustering approach for grouping histogram-valued data...
November 14, 2018: Biometrics
Haben Michael, Suzanne Thornton, Minge Xie, Lu Tian
We describe an exact, unconditional, non-randomized procedure for producing confidence intervals for the grand mean in a normal-normal random effects meta-analysis. The procedure targets meta-analyses based on too few primary studies, ≤ 7; say, to allow for the conventional asymptotic estimators, e.g., DerSimonian and Laird 1986, or non-parametric resampling-based procedures, e.g., Liu, Lee, and Xie 2007. Meta-analyses with such few studies are common, with one recent sample of 22,453 heath-related meta-analsyes finding a median of 3 primary studies per meta-analysis (Davey et al...
November 14, 2018: Biometrics
Hung Hung
Identification of differentially expressed genes (DE genes) is commonly conducted in modern biomedical research. However, unwanted variation inevitably arises during the data collection process, which can make the detection results heavily biased. Various methods have been suggested for removing the unwanted variation while keeping the biological variation to ensure a reliable analysis result. Removing Unwanted Variation (RUV) has recently been proposed for this purpose, which works by virtue of negative control genes...
November 14, 2018: Biometrics
Zhen Chen, Beom Seuk Hwang
In application of diagnostic accuracy, it is possible that a priori information may exist regarding the test score distributions, either between different disease populations for a single test or between multiple correlated tests. Few have considered constrained diagnostic accuracy analysis when the true disease status is binary; almost none when the disease status is ordinal. Motivated by a study on diagnosing endometriosis, we propose an approach to estimating diagnostic accuracy measures that can incorporate different stochastic order constraints on the test scores when an ordinal true disease status is in consideration...
November 3, 2018: Biometrics
Xiaoyu Dai, Nan Lin, Daofeng Li, Ting Wang
In the analysis of next-generation sequencing technology, massive discrete data are generated from short read counts with varying biological coverage. Conducting conditional hypothesis testing such as Fisher's Exact Test at every genomic region of interest thus leads to a heterogeneous multiple discrete testing problem. However, most existing multiple testing procedures for controlling the false discovery rate (FDR) assume that test statistics are continuous and become conservative for discrete tests. To overcome the conservativeness, in this article, we propose a novel multiple testing procedure for better FDR control on heterogeneous discrete tests...
November 2, 2018: Biometrics
Stephen Bates, Robert Tibshirani
Positive-valued signal data is common in the biological and medical sciences, due to the prevalence of mass spectrometry other imaging techniques. With such data, only the relative intensities of the raw measurements are meaningful. It is desirable to consider models consisting of the log-ratios of all pairs of the raw features, since log-ratios are the simplest meaningful derived features. In this case, however, the dimensionality of the predictor space becomes large, and computationally efficient estimation procedures are required...
November 2, 2018: Biometrics
Andrew G Chapple, Peter F Thall
Conventionally, evaluation of a new drug, A, is done in three phases. Phase I is based on toxicity to determine a \maximum tolerable dose" (MTD) of A, phase II is conducted to decide whether A at the MTD is promising in terms of response probability, and if so a large randomized phase III trial is conducted to compare A to a control treatment, C; usually based on survival time or progression free survival time. It is widely recognized that this paradigm has many flaws. A recent approach combines the first two phases by conducting a phase I-II trial, which chooses an optimal dose based on both efficacy and toxicity, and evaluation of A at the selected optimal phase I-II dose then is done in a phase III trial...
October 26, 2018: Biometrics
Xinlei Mi, Fei Zou, Ruoqing Zhu
An ENsemble Deep Learning Optimal Treatment (EndLot) approach is proposed for personalized medicine problems. The statistical framework of the proposed method is based on the outcome weighted learning (OWL) framework which transforms the optimal decision rule problem into a weighted classification problem. We further employ an ensemble of deep neural networks (DNNs) to learn the optimal decision rule. Utilizing the flexibility of DNNs and the stability of bootstrap aggregation, the proposed method achieves a considerable improvement over existing methods...
October 26, 2018: Biometrics
Apurva Bhingare, Debajyoti Sinha, Debdeep Pati, Dipankar Bandyopadhyay, Stuart R Lipsitz
For many real-life studies with skewed multivariate responses, the level of skewness and association structure assumptions are essential for evaluating the covariate effects on the response and its predictive distribution. We present a novel semiparametric multivariate model and associated Bayesian analysis for multivariate skewed responses. Similar to multivariate Gaussian densities, this multivariate model is closed under marginalization, allows a wide class of multivariate associations, and has meaningful physical interpretations of skewness levels and covariate effects on the marginal density...
October 26, 2018: Biometrics
Jared Huling, Menggang Yu, A James O'Malley
Randomized controlled trials are the gold standard for estimating causal effects of treatments or interventions, but in many cases are too costly, too difficult, or even unethical to conduct. Hence, many pressing medical questions can only be investigated using observational studies. However, direct statistical modeling of observational data can result in biased estimates of treatment effects due to unmeasured confounding. In certain cases, instrumental variable based techniques can be used to remove such biases...
October 25, 2018: Biometrics
Christos Thomadakis, Loukia Meligkotsidou, Nikos Pantazis, Giota Touloumi
Missing data are common in longitudinal studies. Likelihood-based methods ignoring the missingness mechanism are unbiased provided missingness is at random (MAR); under not-at-random missingness (MNAR), joint modeling is commonly used, often as part of sensitivity analyses. In our motivating example of modeling CD4 count trajectories during untreated HIV infection, CD4 counts are mainly censored due to treatment initiation, with the nature of this mechanism remaining debatable. Here we evaluate the bias in the disease progression marker's change over time (slope) of a specific class of joint models, termed shared-random-effects-models (SREMs), under MAR drop-out and propose an alternative SREM model...
October 25, 2018: Biometrics
Christopher R Bilder, Joshua M Tebbs, Christopher S McMahan
Infectious disease testing frequently takes advantage of two tools-group testing and multiplex assays-to make testing timely and cost effective. Until the work of Tebbs et al. (2013) and Hou et al. (2017), there was no research available to understand how best to apply these tools simultaneously. This recent work focused on applications where each individual is considered to be identical in terms of the probability of disease. However, risk-factor information, such as past behavior and presence of symptoms, is very often available on each individual to allow one to estimate individual-specific probabilities...
October 24, 2018: Biometrics
Jessica Gronsbell, Jessica Minnier, Sheng Yu, Katherine Liao, Tianxi Cai
The use of Electronic Health Records (EHR) for translational research can be challenging due to difficulty in extracting accurate disease phenotype data. Historically, EHR algorithms for annotating phenotypes have been either rule-based or trained with billing codes and gold standard labels curated via labor intensive medical chart review. These simplistic algorithms tend to have unpredictable portability across institutions and low accuracy for many disease phenotypes due to imprecise billing codes. Recently, more sophisticated machine learning algorithms have been developed to improve the robustness and accuracy of EHR phenotyping algorithms...
October 24, 2018: Biometrics
Zongliang Hu, Tiejun Tong, Marc G Genton
We propose a likelihood ratio test framework for testing normal mean vectors in high-dimensional data under two common scenarios: the one-sample test and the two-sample test with equal covariance matrices. We derive the test statistics under the assumption that the covariance matrices follow a diagonal matrix structure. In comparison with the diagonal Hotelling's tests, our proposed test statistics display some interesting characteristics. In particular, they are a summation of the log-transformed squared t-statistics rather than a direct summation of those components...
October 16, 2018: Biometrics
Ben C Stevenson, David L Borchers, Rachel M Fewster
Capture-recapture methods for estimating wildlife population sizes almost always require their users to identify every detected animal. Many modern-day wildlife surveys detect animals without physical capture-visual detection by cameras is one such example. However, for every pair of detections, the surveyor faces a decision that is often fraught with uncertainty: are they linked to the same individual? An inability to resolve every such decision to a high degree of certainty prevents the use of standard capture-recapture methods, impeding the estimation of animal density...
October 8, 2018: Biometrics
