Read by QxMD icon Read

Computational Statistics & Data Analysis

David Lenis, Benjamin Ackerman, Elizabeth A Stuart
Model misspecification is a potential problem for any parametric-model based analysis. However, the measurement and consequences of model misspecification have not been well formalized in the context of causal inference. A measure of model misspecification is proposed, and the consequences of model misspecification in non-experimental causal inference methods are investigated. The metric is then used to explore which estimators are more sensitive to misspecification of the outcome and/or treatment assignment model...
December 2018: Computational Statistics & Data Analysis
Dewei Wang, Christopher S McMahan, Joshua M Tebbs, Christopher R Bilder
Screening procedures for infectious diseases, such as HIV, often involve pooling individual specimens together and testing the pools. For diseases with low prevalence, group testing (or pooled testing) can be used to classify individuals as diseased or not while providing considerable cost savings when compared to testing specimens individually. The pooling literature is replete with group testing case identification algorithms including Dorfman testing, higher-stage hierarchical procedures, and array testing...
June 2018: Computational Statistics & Data Analysis
Unkyung Lee, Yanqing Sun, Thomas H Scheike, Peter B Gilbert
The cumulative incidence function quantifies the probability of failure over time due to a specific cause for competing risks data. The generalized semiparametric regression models for the cumulative incidence functions with missing covariates are investigated. The effects of some covariates are modeled as non-parametric functions of time while others are modeled as parametric functions of time. Different link functions can be selected to add flexibility in modeling the cumulative incidence functions. The estimation procedures based on the direct binomial regression and the inverse probability weighting of complete cases are developed...
June 2018: Computational Statistics & Data Analysis
Sebastian J Teran Hidalgo, Michael C Wu, Stephanie M Engel, Michael R Kosorok
Nonparametric regression models do not require the specification of the functional form between the outcome and the covariates. Despite their popularity, the amount of diagnostic statistics, in comparison to their parametric counter-parts, is small. We propose a goodness-of-fit test for nonparametric regression models with linear smoother form. In particular, we apply this testing framework to smoothing spline ANOVA models. The test can consider two sources of lack-of-fit: whether covariates that are not currently in the model need to be included, and whether the current model fits the data well...
June 2018: Computational Statistics & Data Analysis
So Young Park, Luo Xiao, Jayson D Willbur, Ana-Maria Staicu, N L'ntshotsholé Jumbe
A joint design for sampling functional data is proposed to achieve optimal prediction of both functional data and a scalar outcome. The motivating application is fetal growth, where the objective is to determine the optimal times to collect ultrasound measurements in order to recover fetal growth trajectories and to predict child birth outcomes. The joint design is formulated using an optimization criterion and implemented in a pilot study. Performance of the proposed design is evaluated via simulation study and application to fetal ultrasound data...
June 2018: Computational Statistics & Data Analysis
Hua Ma, Andriy I Bandos, David Gur
Assessing performance of diagnostic markers is a necessary step for their use in decision making regarding various conditions of interest in diagnostic medicine and other fields. Globally useful markers could, however, have ranges of values that are "diagnostically non-informative". This paper demonstrates that the presence of marker values from diagnostically non-informative ranges could lead to a loss in statistical efficiency during nonparametric evaluation and shows that grouping non-informative values provides a natural resolution to this problem...
January 2018: Computational Statistics & Data Analysis
Sheila Gaynor, Eric Bair
Cluster analysis methods are used to identify homogeneous subgroups in a data set. In biomedical applications, one frequently applies cluster analysis in order to identify biologically interesting subgroups. In particular, one may wish to identify subgroups that are associated with a particular outcome of interest. Conventional clustering methods generally do not identify such subgroups, particularly when there are a large number of high-variance features in the data set. Conventional methods may identify clusters associated with these high-variance features when one wishes to obtain secondary clusters that are more interesting biologically or more strongly associated with a particular outcome of interest...
December 2017: Computational Statistics & Data Analysis
Feipeng Zhang, Qunhua Li
Expectile regression is a useful tool for exploring the relation between the response and the explanatory variables beyond the conditional mean. A continuous threshold expectile regression is developed for modeling data in which the effect of a covariate on the response variable is linear but varies below and above an unknown threshold in a continuous way. The estimators for the threshold and the regression coefficients are obtained using a grid search approach. The asymptotic properties for all the estimators are derived, and the estimator for the threshold is shown to achieve root-n consistency...
December 2017: Computational Statistics & Data Analysis
Keunbaik Lee, Changryong Baek, Michael J Daniels
In longitudinal studies, serial dependence of repeated outcomes must be taken into account to make correct inferences on covariate effects. As such, care must be taken in modeling the covariance matrix. However, estimation of the covariance matrix is challenging because there are many parameters in the matrix and the estimated covariance matrix should be positive definite. To overcomes these limitations, two Cholesky decomposition approaches have been proposed: modified Cholesky decomposition for autoregressive (AR) structure and moving average Cholesky decomposition for moving average (MA) structure, respectively...
November 2017: Computational Statistics & Data Analysis
Chang Yu, Daniel Zelterman
Microarray studies generate a large number of p-values from many gene expression comparisons. The estimate of the proportion of the p-values sampled from the null hypothesis draws broad interest. The two-component mixture model is often used to estimate this proportion. If the data are generated under the null hypothesis, the p-values follow the uniform distribution. What is the distribution of p-values when data are sampled from the alternative hypothesis? The distribution is derived for the chi-squared test...
October 2017: Computational Statistics & Data Analysis
Zheyu Wang, Krisztian Sebestyen, Sarah E Monsell
A model-based clustering method is proposed to address two research aims in Alzheimer's disease (AD): to evaluate the accuracy of imaging biomarkers in AD prognosis, and to integrate biomarker information and standard clinical test results into the diagnoses. One challenge in such biomarker studies is that it is often desired or necessary to conduct the evaluation without relying on clinical diagnoses or some other standard references. This is because (1) biomarkers may provide prognostic information long before any standard reference can be acquired; (2) these references are often based on or provide unfair advantage to standard tests...
September 2017: Computational Statistics & Data Analysis
S Faye Williamson, Peter Jacko, Sofía S Villar, Thomas Jaki
Development of treatments for rare diseases is challenging due to the limited number of patients available for participation. Learning about treatment effectiveness with a view to treat patients in the larger outside population, as in the traditional fixed randomised design, may not be a plausible goal. An alternative goal is to treat the patients within the trial as effectively as possible. Using the framework of finite-horizon Markov decision processes and dynamic programming (DP), a novel randomised response-adaptive design is proposed which maximises the total number of patient successes in the trial and penalises if a minimum number of patients are not recruited to each treatment arm...
September 2017: Computational Statistics & Data Analysis
Andrew G Chapple, Marina Vannucci, Peter F Thall, Steven Lin
A variable selection procedure is developed for a semi-competing risks regression model with three hazard functions that uses spike-and-slab priors and stochastic search variable selection algorithms for posterior inference. A rule is devised for choosing the threshold on the marginal posterior probability of variable inclusion based on the Deviance Information Criterion (DIC) that is examined in a simulation study. The method is applied to data from esophageal cancer patients from the MD Anderson Cancer Center, Houston, TX, where the most important covariates are selected in each of the hazards of effusion, death before effusion, and death after effusion...
August 2017: Computational Statistics & Data Analysis
Hongxiao Zhu, Jeffrey S Morris, Fengrong Wei, Dennis D Cox
Many scientific studies measure different types of high-dimensional signals or images from the same subject, producing multivariate functional data. These functional measurements carry different types of information about the scientific process, and a joint analysis that integrates information across them may provide new insights into the underlying mechanism for the phenomenon under study. Motivated by fluorescence spectroscopy data in a cervical pre-cancer study, a multivariate functional response regression model is proposed, which treats multivariate functional observations as responses and a common set of covariates as predictors...
July 2017: Computational Statistics & Data Analysis
Hao Hu, Weixin Yao, Yichao Wu
Finite mixture of regression (FMR) models can be reformulated as incomplete data problems and they can be estimated via the expectation-maximization (EM) algorithm. The main drawback is the strong parametric assumption such as FMR models with normal distributed residuals. The estimation might be biased if the model is misspecified. To relax the parametric assumption about the component error densities, a new method is proposed to estimate the mixture regression parameters by only assuming that the components have log-concave error densities but the specific parametric family is unknown...
July 2017: Computational Statistics & Data Analysis
Himel Mallick, Nengjun Yi
A Bayesian bi-level variable selection method (BAGB: Bayesian Analysis of Group Bridge) is developed for regularized regression and classification. This new development is motivated by grouped data, where generic variables can be divided into multiple groups, with variables in the same group being mechanistically related or statistically correlated. As an alternative to frequentist group variable selection methods, BAGB incorporates structural information among predictors through a group-wise shrinkage prior...
June 2017: Computational Statistics & Data Analysis
Zhongkai Liu, Rui Song, Donglin Zeng, Jiajia Zhang
Marginal screening has been established as a fast and effective method for high dimensional variable selection method. There are some drawbacks associated with marginal screening, since the marginal model can be viewed as a model misspecification from the joint true model. A principal components adjusted variable screening method is proposed, which uses top principal components as surrogate covariates to account for the variability of the omitted predictors in generalized linear models. The proposed method is demonstrated with superior numerical performance compared with the competing methods...
June 2017: Computational Statistics & Data Analysis
Shengtong Han, Hongmei Zhang, Wilfried Karmaus, Graham Roberts, Hasan Arshad
Background noise in cluster analyses can potentially mask the true underlying patterns. To tease out patterns uniquely to certain populations, a Bayesian semi-parametric clustering method is presented. It infers and adjusts background noise. The method is built upon a mixture of the Dirichlet process and a point mass function. Simulations demonstrate the effectiveness of the proposed method. The method is then applied to analyze a longitudinal data set on allergic sensitization and asthma status.
May 2017: Computational Statistics & Data Analysis
Seongho Kim, Hyejeong Jang, Imhoi Koo, Joohyoung Lee, Xiang Zhang
Compared to other analytical platforms, comprehensive two-dimensional gas chromatography coupled with mass spectrometry (GC×GC-MS) has much increased separation power for analysis of complex samples and thus is increasingly used in metabolomics for biomarker discovery. However, accurate peak detection remains a bottleneck for wide applications of GC×GC-MS. Therefore, the normal-exponential-Bernoulli (NEB) model is generalized by gamma distribution and a new peak detection algorithm using the normal-gamma-Bernoulli (NGB) model is developed...
January 2017: Computational Statistics & Data Analysis
Jan Gertheiss, Jeff Goldsmith, Ana-Maria Staicu
Non-Gaussian functional data are considered and modeling through functional principal components analysis (FPCA) is discussed. The direct extension of popular FPCA techniques to the generalized case incorrectly uses a marginal mean estimate for a model that has an inherently conditional interpretation, and thus leads to biased estimates of population and subject-level effects. The methods proposed address this shortcoming by using either a two-stage or joint estimation strategy. The performance of all methods is compared numerically in simulations...
January 2017: Computational Statistics & Data Analysis
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"