Read by QxMD icon Read


J E Johndrow, K Lum, D B Dunson
There has been substantial recent interest in record linkage, where one attempts to group the records pertaining to the same entities from one or more large databases that lack unique identifiers. This can be viewed as a type of microclustering, with few observations per cluster and a very large number of clusters. We show that the problem is fundamentally hard from a theoretical perspective and, even in idealized cases, accurate entity resolution is effectively impossible unless the number of entities is small relative to the number of records and/or the separation between records from different entities is extremely large...
June 2018: Biometrika
Raymond K W Wong, Kwun Chuen Gary Chan
Covariate balance is often advocated for objective causal inference since it mimics randomization in observational data. Unlike methods that balance specific moments of covariates, our proposal attains uniform approximate balance for covariate functions in a reproducing-kernel Hilbert space. The corresponding infinite-dimensional optimization problem is shown to have a finite-dimensional representation in terms of an eigenvalue optimization problem. Large-sample results are studied, and numerical examples show that the proposed method achieves better balance with smaller sampling variability than existing methods...
March 2018: Biometrika
Yunro Chung, Anastasia Ivanova, Michael G Hudgens, Jason P Fine
We consider the estimation of the semiparametric proportional hazards model with an unspecified baseline hazard function where the effect of a continuous covariate is assumed to be monotone. Previous work on nonparametric maximum likelihood estimation for isotonic proportional hazard regression with right-censored data is computationally intensive, lacks theoretical justification, and may be prohibitive in large samples. In this paper, partial likelihood estimation is studied. An iterative quadratic programming method is considered, which has performed well with likelihoods for isotonic parametric regression models...
March 1, 2018: Biometrika
K Schorning, H Dette, K Kettelhake, W K Wong, F Bretz
We derive optimal designs to estimate efficacy and toxicity in active controlled dose-finding trials when the bivariate continuous outcomes are described using nonlinear regression models. We determine upper bounds on the required number of different doses and provide conditions under which the boundary points of the design space are included in the optimal design. We provide an analytical description of minimally supported optimal designs and show that they do not depend on the correlation between the bivariate outcomes...
December 2017: Biometrika
Byeong Yeob Choi, Jason P Fine, M Alan Brookhart
Two-stage least squares estimation is popular for structural equation models with unmeasured confounders. In such models, both the outcome and the exposure are assumed to follow linear models conditional on the measured confounders and instrumental variable, which is related to the outcome only via its relation with the exposure. We consider data where both the outcome and the exposure may be incompletely observed, with particular attention to the case where both are censored event times. A general class of two-stage minimum distance estimators is proposed that separately fits linear models for the outcome and exposure and then uses a minimum distance criterion based on the reduced-form model for the outcome to estimate the regression parameters of interest...
December 2017: Biometrika
D Benkeser, M Carone, M J Van Der Laan, P B Gilbert
Doubly robust estimators are widely used to draw inference about the average effect of a treatment. Such estimators are consistent for the effect of interest if either one of two nuisance parameters is consistently estimated. However, if flexible, data-adaptive estimators of these nuisance parameters are used, double robustness does not readily extend to inference. We present a general theoretical study of the behaviour of doubly robust estimators of an average treatment effect when one of the nuisance parameters is inconsistently estimated...
December 2017: Biometrika
Liping Zhu, Kai Xu, Runze Li, Wei Zhong
We propose the use of projection correlation to characterize dependence between two random vectors. Projection correlation has several appealing properties. It equals zero if and only if the two random vectors are independent, it is not sensitive to the dimensions of the two random vectors, it is invariant with respect to the group of orthogonal transformations, and its estimation is free of tuning parameters and does not require moment conditions on the random vectors. We show that the sample estimate of the projection correction is [Formula: see text]-consistent if the two random vectors are independent and root-[Formula: see text]-consistent otherwise...
December 2017: Biometrika
Fang Han, Shizhe Chen, Han Liu
We consider the testing of mutual independence among all entries in a [Formula: see text]-dimensional random vector based on [Formula: see text] independent observations. We study two families of distribution-free test statistics, which include Kendall's tau and Spearman's rho as important examples. We show that under the null hypothesis the test statistics of these two families converge weakly to Gumbel distributions, and we propose tests that control the Type I error in the high-dimensional setting where [Formula: see text]...
December 2017: Biometrika
Odile Stalder, Alex Asher, Liang Liang, Raymond J Carroll, Yanyuan Ma, Nilanjan Chatterjee
Many methods have recently been proposed for efficient analysis of case-control studies of gene-environment interactions using a retrospective likelihood framework that exploits the natural assumption of gene-environment independence in the underlying population. However, for polygenic modelling of gene-environment interactions, which is a topic of increasing scientific interest, applications of retrospective methods have been limited due to a requirement in the literature for parametric modelling of the distribution of the genetic factors...
December 2017: Biometrika
M W Wheeler, D B Dunson, A H Herring
We consider shape restricted nonparametric regression on a closed set [Formula: see text], where it is reasonable to assume the function has no more than H local extrema interior to [Formula: see text]. Following a Bayesian approach we develop a nonparametric prior over a novel class of local extremum splines. This approach is shown to be consistent when modeling any continuously differentiable function within the class considered, and is used to develop methods for testing hypotheses on the shape of the curve...
December 2017: Biometrika
Tom M W Nye, Xiaoxian Tang, Grady Weyenberg, Ruriko Yoshida
Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample's structure. However, the space of all phylogenetic trees on a fixed set of species does not form a Euclidean vector space, and methods adapted to tree space are needed...
December 2017: Biometrika
S Jadhav, H L Koul, Q Lu
This paper considers testing for no effect of functional covariates on response variables in multivariate regression. We use generalized estimating equations to determine the underlying parameters and establish their joint asymptotic normality. This is then used to test the significance of the effect of predictors on the vector of response variables. Simulations demonstrate the importance of considering existing correlation structures in the data. To explore the effect of treating genetic data as a function, we perform a simulation study using gene sequencing data and find that the performance of our test is comparable to that of another popular method used in sequencing studies...
December 2017: Biometrika
Jian Kang, Hyokyoung G Hong, Y I Li
Traditional variable selection methods are compromised by overlooking useful information on covariates with similar functionality or spatial proximity, and by treating each covariate independently. Leveraging prior grouping information on covariates, we propose partition-based screening methods for ultrahigh-dimensional variables in the framework of generalized linear models. We show that partition-based screening exhibits the sure screening property with a vanishing false selection rate, and we propose a data-driven partition screening framework with unavailable or unreliable prior knowledge on covariate grouping and investigate its theoretical properties...
November 2017: Biometrika
Sanvesh Srivastava, Barbara E Engelhardt, David B Dunson
Bayesian sparse factor models have proven useful for characterizing dependence in multivariate data, but scaling computation to large numbers of samples and dimensions is problematic. We propose expandable factor analysis for scalable inference in factor models when the number of factors is unknown. The method relies on a continuous shrinkage prior for efficient maximum a posteriori estimation of a low-rank and sparse loadings matrix. The structure of the prior leads to an estimation algorithm that accommodates uncertainty in the number of factors...
September 2017: Biometrika
Y She, K Chen
In high-dimensional multivariate regression problems, enforcing low rank in the coefficient matrix offers effective dimension reduction, which greatly facilitates parameter estimation and model interpretation. However, commonly used reduced-rank methods are sensitive to data corruption, as the low-rank dependence structure between response variables and predictors is easily distorted by outliers. We propose a robust reduced-rank regression approach for joint modelling and outlier detection. The problem is formulated as a regularized multivariate regression with a sparse mean-shift parameterization, which generalizes and unifies some popular robust multivariate methods...
September 2017: Biometrika
Linbo Wang, Xiao-Hua Zhou, Thomas S Richardson
It is common in medical studies that the outcome of interest is truncated by death, meaning that a subject has died before the outcome could be measured. In this case, restricted analysis among survivors may be subject to selection bias. Hence, it is of interest to estimate the survivor average causal effect, defined as the average causal effect among the subgroup consisting of subjects who would survive under either exposure. In this paper, we consider the identification and estimation problems of the survivor average causal effect...
September 2017: Biometrika
Ming-Yueh Huang, Kwun Chuen Gary Chan
The estimation of treatment effects based on observational data usually involves multiple confounders, and dimension reduction is often desirable and sometimes inevitable. We first clarify the definition of a central subspace that is relevant for the efficient estimation of average treatment effects. A criterion is then proposed to simultaneously estimate the structural dimension, the basis matrix of the joint central subspace, and the optimal bandwidth for estimating the conditional treatment effects. The method can easily be implemented by forward selection...
September 2017: Biometrika
J Molina, A Rotnitzky, M Sued, J M Robins
We consider inference under a nonparametric or semiparametric model with likelihood that factorizes as the product of two or more variation-independent factors. We are interested in a finite-dimensional parameter that depends on only one of the likelihood factors and whose estimation requires the auxiliary estimation of one or several nuisance functions. We investigate general structures conducive to the construction of so-called multiply robust estimating functions, whose computation requires postulating several dimension-reducing models but which have mean zero at the true parameter value provided one of these models is correct...
September 2017: Biometrika
Donglin Zeng, Fei Gao, D Y Lin
Interval-censored multivariate failure time data arise when there are multiple types of failure or there is clustering of study subjects and each failure time is known only to lie in a certain interval. We investigate the effects of possibly time-dependent covariates on multivariate failure times by considering a broad class of semiparametric transformation models with random effects, and we study nonparametric maximum likelihood estimation under general interval-censoring schemes. We show that the proposed estimators for the finite-dimensional parameters are consistent and asymptotically normal, with a limiting covariance matrix that attains the semiparametric efficiency bound and can be consistently estimated through profile likelihood...
September 2017: Biometrika
O Papaspiliopoulos, D Rossell
We propose a scalable algorithmic framework for exact Bayesian variable selection and model averaging in linear models under the assumption that the Gram matrix is block-diagonal, and as a heuristic for exploring the model space for general designs. In block-diagonal designs our approach returns the most probable model of any given size without resorting to numerical integration. The algorithm also provides a novel and efficient solution to the frequentist best subset selection problem for block-diagonal designs...
June 2017: Biometrika
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"