Read by QxMD icon Read

Computational Statistics & Data Analysis

Seongho Kim, Hyejeong Jang, Imhoi Koo, Joohyoung Lee, Xiang Zhang
Compared to other analytical platforms, comprehensive two-dimensional gas chromatography coupled with mass spectrometry (GC×GC-MS) has much increased separation power for analysis of complex samples and thus is increasingly used in metabolomics for biomarker discovery. However, accurate peak detection remains a bottleneck for wide applications of GC×GC-MS. Therefore, the normal-exponential-Bernoulli (NEB) model is generalized by gamma distribution and a new peak detection algorithm using the normal-gamma-Bernoulli (NGB) model is developed...
January 2017: Computational Statistics & Data Analysis
Jan Gertheiss, Jeff Goldsmith, Ana-Maria Staicu
Non-Gaussian functional data are considered and modeling through functional principal components analysis (FPCA) is discussed. The direct extension of popular FPCA techniques to the generalized case incorrectly uses a marginal mean estimate for a model that has an inherently conditional interpretation, and thus leads to biased estimates of population and subject-level effects. The methods proposed address this shortcoming by using either a two-stage or joint estimation strategy. The performance of all methods is compared numerically in simulations...
January 2017: Computational Statistics & Data Analysis
Ling Chen, Jianguo Sun, Chengjie Xiong
Clustered interval-censored failure time data can occur when the failure time of interest is collected from several clusters and known only within certain time intervals. Regression analysis of clustered interval-censored failure time data is discussed assuming that the data arise from the semiparametric additive hazards model. A multiple imputation approach is proposed for inference. A major advantage of the approach is its simplicity because it avoids estimating the correlation within clusters by implementing a resampling-based method...
November 2016: Computational Statistics & Data Analysis
Hao Hu, Yichao Wu, Weixin Yao
Finite mixture models are useful tools and can be estimated via the EM algorithm. A main drawback is the strong parametric assumption about the component densities. In this paper, a much more flexible mixture model is considered, which assumes each component density to be log-concave. Under fairly general conditions, the log-concave maximum likelihood estimator (LCMLE) exists and is consistent. Numeric examples are also made to demonstrate that the LCMLE improves the clustering results while comparing with the traditional MLE for parametric mixture models...
September 2016: Computational Statistics & Data Analysis
Fadlalla G Elfadaly, Paul H Garthwaite, John R Crawford
Mahalanobis distance may be used as a measure of the disparity between an individual's profile of scores and the average profile of a population of controls. The degree to which the individual's profile is unusual can then be equated to the proportion of the population who would have a larger Mahalanobis distance than the individual. Several estimators of this proportion are examined. These include plug-in maximum likelihood estimators, medians, the posterior mean from a Bayesian probability matching prior, an estimator derived from a Taylor expansion, and two forms of polynomial approximation, one based on Bernstein polynomial and one on a quadrature method...
July 2016: Computational Statistics & Data Analysis
Dipankar Bandyopadhyay, M Amalia Jácome
In studies involving nonparametric testing of the equality of two or more survival distributions, the survival curves can exhibit a wide variety of behaviors such as proportional hazards, early/late differences, and crossing hazards. As alternatives to the classical logrank test, the weighted Kaplan-Meier (WKM) type statistic and their variations were developed to handle these situations. However, their applicability is limited to cases where the population membership is available for all observations, including the right censored ones...
March 1, 2016: Computational Statistics & Data Analysis
Cheng Cheng
In large scale genomic analyses dealing with detecting genotype-phenotype associations, such as genome wide association studies (GWAS), it is desirable to have numerically and statistically robust procedures to test the stochastic independence null hypothesis against certain alternatives. Motivated by a special case in a GWAS, a novel test procedure called correlation profile test (CPT) is developed for testing genomic associations with failure-time phenotypes subject to right censoring and competing risks...
March 1, 2016: Computational Statistics & Data Analysis
Joseph Usset, Ana-Maria Staicu, Arnab Maity
A functional regression model with a scalar response and multiple functional predictors is proposed that accommodates two-way interactions in addition to their main effects. The proposed estimation procedure models the main effects using penalized regression splines, and the interaction effect by a tensor product basis. Extensions to generalized linear models and data observed on sparse grids or with measurement error are presented. A hypothesis testing procedure for the functional interaction effect is described...
February 1, 2016: Computational Statistics & Data Analysis
Adam Ciarleglio, R Todd Ogden
Classical finite mixture regression is useful for modeling the relationship between scalar predictors and scalar responses arising from subpopulations defined by the di ering associations between those predictors and responses. The classical finite mixture regression model is extended to incorporate functional predictors by taking a wavelet-based approach in which both the functional predictors and the component-specific coefficient functions are represented in terms of an appropriate wavelet basis. By using the wavelet representation of the model, the coefficients corresponding to the functional covariates become the predictors...
January 1, 2016: Computational Statistics & Data Analysis
Yanqing Sun, Mei Li, Peter B Gilbert
Motivated by the need to assess HIV vaccine efficacy, previous studies proposed an extension of the discrete competing risks proportional hazards model, in which the cause of failure is replaced by a continuous mark only observed at the failure time. However the model assumptions may fail in several ways, and no diagnostic testing procedure for this situation has been proposed. A goodness-of-fit test procedure for the stratified mark-specific proportional hazards model in which the regression parameters depend nonparametrically on the mark and the baseline hazards depends nonparametrically on both time and the mark is proposed...
January 1, 2016: Computational Statistics & Data Analysis
Tong Tong Wu, Kenneth Lange
Matrix completion discriminant analysis (MCDA) is designed for semi-supervised learning where the rate of missingness is high and predictors vastly outnumber cases. MCDA operates by mapping class labels to the vertices of a regular simplex. With c classes, these vertices are arranged on the surface of the unit sphere in c - 1 dimensional Euclidean space. Because all pairs of vertices are equidistant, the classes are treated symmetrically. To assign unlabeled cases to classes, the data is entered into a large matrix (cases along rows and predictors along columns) that is augmented by vertex coordinates stored in the last c - 1 columns...
December 2015: Computational Statistics & Data Analysis
Bruce J Swihart, Naresh M Punjabi, Ciprian M Crainiceanu
Methods are introduced for the analysis of large sets of sleep study data (hypnograms) using a 5-state 20-transition-type structure defined by the American Academy of Sleep Medicine. Application of these methods to the hypnograms of 5598 subjects from the Sleep Heart Health Study provide: the first analysis of sleep hypnogram data of such size and complexity in a community cohort with a range of sleep-disordered breathing severity; introduce a novel approach to compare 5-state (20-transition-type) to 3-state (6-transition-type) sleep structures to assess information loss from combining sleep state categories; extend current approaches of multivariate survival data analysis to clustered, recurrent event discrete-state discrete-time processes; and provide scalable solutions for data analyses required by the case study...
September 2015: Computational Statistics & Data Analysis
Chen Yue, Shaojie Chen, Haris I Sair, Raag Airan, Brian S Caffo
Data reproducibility is a critical issue in all scientific experiments. In this manuscript, the problem of quantifying the reproducibility of graphical measurements is considered. The image intra-class correlation coefficient (I2C2) is generalized and the graphical intra-class correlation coefficient (GICC) is proposed for such purpose. The concept for GICC is based on multivariate probit-linear mixed effect models. A Markov Chain Monte Carlo EM (mcm-cEM) algorithm is used for estimating the GICC. Simulation results with varied settings are demonstrated and our method is applied to the KIRBY21 test-retest dataset...
September 2015: Computational Statistics & Data Analysis
Adrian W Bowman, Stanislav Katina, Joanna Smith, Denise Brown
Methods for capturing images in three dimensions are now widely available, with stereo-photogrammetry and laser scanning being two common approaches. In anatomical studies, a number of landmarks are usually identified manually from each of these images and these form the basis of subsequent statistical analysis. However, landmarks express only a very small proportion of the information available from the images. Anatomically defined curves have the advantage of providing a much richer expression of shape. This is explored in the context of identifying the boundary of breasts from an image of the female torso and the boundary of the lips from a facial image...
June 2015: Computational Statistics & Data Analysis
Hong Zhu, Bo Lu
This article considers the practical problem in clinical and observational studies where multiple treatment or prognostic groups are compared and the observed survival data are subject to right censoring. Two possible formulations of multiple comparisons are suggested. Multiple Comparisons with a Control (MCC) compare every other group to a control group with respect to survival outcomes, for determining which groups are associated with lower risk than the control. Multiple Comparisons with the Best (MCB) compare each group to the truly minimum risk group and identify the groups that are either with the minimum risk or the practically minimum risk...
June 1, 2015: Computational Statistics & Data Analysis
Kean Ming Tan, Daniela Witten, Ali Shojaie
The task of estimating a Gaussian graphical model in the high-dimensional setting is considered. The graphical lasso, which involves maximizing the Gaussian log likelihood subject to a lasso penalty, is a well-studied approach for this task. A surprising connection between the graphical lasso and hierarchical clustering is introduced: the graphical lasso in effect performs a two-step procedure, in which (1) single linkage hierarchical clustering is performed on the variables in order to identify connected components, and then (2) a penalized log likelihood is maximized on the subset of variables within each connected component...
May 2015: Computational Statistics & Data Analysis
Maiying Kong, Sheng Xu, Steven M Levy, Somnath Datta
Use of zero-inflated count data models is common in applications where the number of zero counts exceeds that predicted from a traditional count data model such as Poisson or negative binomial. When count data exhibiting inflated zero counts are correlated among subjects, a natural approach will be to fit a marginal model with the help of generalized estimating equations (GEE) that can incorporate subject-to-subject correlations. A GEE based zero-inflated negative binomial (ZINB) model is proposed to fit clustered counts with excessive zeros...
May 1, 2015: Computational Statistics & Data Analysis
Paul W Bernhardt, Daowen Zhang, Huixia Judy Wang
Joint modeling techniques have become a popular strategy for studying the association between a response and one or more longitudinal covariates. Motivated by the GenIMS study, where it is of interest to model the event of survival using censored longitudinal biomarkers, a joint model is proposed for describing the relationship between a binary outcome and multiple longitudinal covariates subject to detection limits. A fast, approximate EM algorithm is developed that reduces the dimension of integration in the E-step of the algorithm to one, regardless of the number of random effects in the joint model...
May 1, 2015: Computational Statistics & Data Analysis
Dehan Kong, Howard Bondell, Yichao Wu
In this article, we consider the varying coefficient model, which allows the relationship between the predictors and response to vary across the domain of interest, such as time. In applications, it is possible that certain predictors only affect the response in particular regions and not everywhere. This corresponds to identifying the domain where the varying coefficient is nonzero. Towards this goal, local polynomial smoothing and penalized regression are incorporated into one framework. Asymptotic properties of our penalized estimators are provided...
March 1, 2015: Computational Statistics & Data Analysis
Victor De Oliveira, Bazoumana Kone
Methodology is proposed for the construction of prediction intervals for integrals of Gaussian random fields over bounded regions (called block averages in the geostatistical literature) based on observations at a finite set of sampling locations. Two bootstrap calibration algorithms are proposed, termed indirect and direct, aimed at improving upon plug-in prediction intervals in terms of coverage probability. A simulation study is carried out that illustrates the effectiveness of both procedures, and these procedures are applied to estimate block averages of chromium traces in a potentially contaminated region in Switzerland...
March 1, 2015: Computational Statistics & Data Analysis
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"