Read by QxMD icon Read

Statistical Applications in Genetics and Molecular Biology

Berit Lindum Waltoft, Asger Hobolth
Changes in population size is a useful quantity for understanding the evolutionary history of a species. Genetic variation within a species can be summarized by the site frequency spectrum (SFS). For a sample of size n, the SFS is a vector of length n - 1 where entry i is the number of sites where the mutant base appears i times and the ancestral base appears n - i times. We present a new method, CubSFS, for estimating the changes in population size of a panmictic population from an observed SFS. First, we provide a straightforward proof for the expression of the expected site frequency spectrum depending only on the population size...
June 11, 2018: Statistical Applications in Genetics and Molecular Biology
Jeffrey J Gory, Radu Herbei, Laura S Kubatko
The increasing availability of population-level allele frequency data across one or more related populations necessitates the development of methods that can efficiently estimate population genetics parameters, such as the strength of selection acting on the population(s), from such data. Existing methods for this problem in the setting of the Wright-Fisher diffusion model are primarily likelihood-based, and rely on numerical approximation for likelihood computation and on bootstrapping for assessment of variability in the resulting estimates, requiring extensive computation...
June 6, 2018: Statistical Applications in Genetics and Molecular Biology
Gustavo H Esteves, Luiz F L Reis
MOTIVATION: Gene expression data analysis is of great importance for modern molecular biology, given our ability to measure the expression profiles of thousands of genes and enabling studies rooted in systems biology. In this work, we propose a simple statistical model for the activation measuring of gene regulatory networks, instead of the traditional gene co-expression networks. RESULTS: We present the mathematical construction of a statistical procedure for testing hypothesis regarding gene regulatory network activation...
June 13, 2018: Statistical Applications in Genetics and Molecular Biology
Jere Koskela
We introduce a low dimensional function of the site frequency spectrum that is tailor-made for distinguishing coalescent models with multiple mergers from Kingman coalescent models with population growth, and use this function to construct a hypothesis test between these model classes. The null and alternative sampling distributions of the statistic are intractable, but its low dimensionality renders them amenable to Monte Carlo estimation. We construct kernel density estimates of the sampling distributions based on simulated data, and show that the resulting hypothesis test dramatically improves on the statistical power of a current state-of-the-art method...
June 13, 2018: Statistical Applications in Genetics and Molecular Biology
Nele Cosemans, Peter Claes, Nathalie Brison, Joris Robert Vermeesch, Hilde Peeters
Arrays based on single nucleotide polymorphisms (SNPs) have been successful for the large scale discovery of copy number variants (CNVs). However, current CNV calling algorithms still have limitations in detecting CNVs with high specificity and sensitivity, especially in case of small (<100 kb) CNVs. Therefore, this study presents a simple statistical analysis to evaluate CNV calls from SNP arrays in order to improve the noise-robustness of existing CNV calling algorithms. The proposed approach estimates local noise of log R ratios and returns the probability that a certain observation is different from this log R ratio noise level...
April 28, 2018: Statistical Applications in Genetics and Molecular Biology
Jialin Zhang, Chen Chen
Zhang, Z. and Zheng, L. (2015): "A mutual information estimator with exponentially decaying bias," Stat. Appl. Genet. Mol. Biol., 14, 243-252, proposed a nonparametric estimator of mutual information developed in entropic perspective, and demonstrated that it has much smaller bias than the plugin estimator yet with the same asymptotic normality under certain conditions. However it is incorrectly suggested in their article that the asymptotic normality could be used for testing independence between two random elements on a joint alphabet...
March 30, 2018: Statistical Applications in Genetics and Molecular Biology
Jean-Eudes Dazard, Hemant Ishwaran, Rajeev Mehlotra, Aaron Weinberg, Peter Zimmerman
Unraveling interactions among variables such as genetic, clinical, demographic and environmental factors is essential to understand the development of common and complex diseases. To increase the power to detect such variables interactions associated with clinical time-to-events outcomes, we borrowed established concepts from random survival forest (RSF) models. We introduce a novel RSF-based pairwise interaction estimator and derive a randomization method with bootstrap confidence intervals for inferring interaction significance...
February 17, 2018: Statistical Applications in Genetics and Molecular Biology
Cen Wu, Ping-Shou Zhong, Yuehua Cui
Gene-environment (G×E) interaction plays a pivotal role in understanding the genetic basis of complex disease. When environmental factors are measured continuously, one can assess the genetic sensitivity over different environmental conditions on a disease trait. Motivated by the increasing awareness of gene set based association analysis over single variant based approaches, we proposed an additive varying-coefficient model to jointly model variants in a genetic system. The model allows us to examine how variants in a gene set are moderated by an environment factor to affect a disease phenotype...
February 8, 2018: Statistical Applications in Genetics and Molecular Biology
Jiehuan Sun, Jose D Herazo-Maya, Xiu Huang, Naftali Kaminski, Hongyu Zhao
Longitudinal gene expression profiles of subjects are collected in some clinical studies to monitor disease progression and understand disease etiology. The identification of gene sets that have coordinated changes with relevant clinical outcomes over time from these data could provide significant insights into the molecular basis of disease progression and lead to better treatments. In this article, we propose a Distance-Correlation based Gene Set Analysis (dcGSA) method for longitudinal gene expression data...
February 5, 2018: Statistical Applications in Genetics and Molecular Biology
Marco Marozzi
In biomedical research, multiple endpoints are commonly analyzed in "omics" fields like genomics, proteomics and metabolomics. Traditional methods designed for low-dimensional data either perform poorly or are not applicable when analyzing high-dimensional data whose dimension is generally similar to, or even much larger than, the number of subjects. The complex biochemical interplay between hundreds (or thousands) of endpoints is reflected by complex dependence relations. The aim of the paper is to propose tests that are very suitable for analyzing omics data because they do not require the normality assumption, are powerful also for small sample sizes, in the presence of complex dependence relations among endpoints, and when the number of endpoints is much larger than the number of subjects...
January 30, 2018: Statistical Applications in Genetics and Molecular Biology
Colleen Nooney, Stuart Barber, Arief Gusnanto, Walter R Gilks
We introduce a new method to test efficiently for cospeciation in tritrophic systems. Our method utilises an analogy with electrical circuit theory to reduce higher order systems into bitrophic data sets that retain the information of the original system. We use a sophisticated permutation scheme that weights interactions between two trophic layers based on their connection to the third layer in the system. Our method has several advantages compared to the method of Mramba et al. [Mramba, L. K., S. Barber, K...
November 27, 2017: Statistical Applications in Genetics and Molecular Biology
Christopher McMahan, James Baurley, William Bridges, Chase Joyner, Muhamad Fitra Kacamarga, Robert Lund, Carissa Pardamean, Bens Pardamean
Genomic studies of plants often seek to identify genetic factors associated with desirable traits. The process of evaluating genetic markers one by one (i.e. a marginal analysis) may not identify important polygenic and environmental effects. Further, confounding due to growing conditions/factors and genetic similarities among plant varieties may influence conclusions. When developing new plant varieties to optimize yield or thrive in future adverse conditions (e.g. flood, drought), scientists seek a complete understanding of how the factors influence desirable traits...
November 27, 2017: Statistical Applications in Genetics and Molecular Biology
Johanna Bertl, Gregory Ewing, Carolin Kosiol, Andreas Futschik
In many population genetic problems, parameter estimation is obstructed by an intractable likelihood function. Therefore, approximate estimation methods have been developed, and with growing computational power, sampling-based methods became popular. However, these methods such as Approximate Bayesian Computation (ABC) can be inefficient in high-dimensional problems. This led to the development of more sophisticated iterative estimation methods like particle filters. Here, we propose an alternative approach that is based on stochastic approximation...
November 27, 2017: Statistical Applications in Genetics and Molecular Biology
Panagiotis Papastamoulis, Magnus Rattray
Next generation sequencing allows the identification of genes consisting of differentially expressed transcripts, a term which usually refers to changes in the overall expression level. A specific type of differential expression is differential transcript usage (DTU) and targets changes in the relative within gene expression of a transcript. The contribution of this paper is to: (a) extend the use of cjBitSeq to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels and (b) propose a Bayesian version of DRIMSeq, a frequentist model for inferring DTU...
November 27, 2017: Statistical Applications in Genetics and Molecular Biology
Elena Szefer, Donghuan Lu, Farouk Nathoo, Mirza Faisal Beg, Jinko Graham
Using publicly-available data from the Alzheimer's Disease Neuroimaging Initiative, we investigate the joint association between single-nucleotide polymorphisms (SNPs) in previously established linkage regions for Alzheimer's disease (AD) and rates of decline in brain structure. In an initial, discovery stage of analysis, we applied a weighted RV test to assess the association between 75,845 SNPs in the Alzgene linkage regions and rates of change in structural MRI measurements for 56 brain regions affected by AD, in 632 subjects...
November 27, 2017: Statistical Applications in Genetics and Molecular Biology
Ekua Kotoka, Megan Orr
RNA-Seq is a developing technology for generating gene expression data by directly sequencing mRNA molecules in a sample. RNA-Seq data consist of counts of reads recorded to a particular gene that are often used to identify differentially expressed (DE) genes. A common statistical method used to analyze RNA-Seq data is Significance Analysis of Microarray with emphasis on RNA-Seq data (SAMseq). SAMseq is a nonparametric method that uses a resampling technique to account for differences in sequencing depths when identifying DE genes...
November 27, 2017: Statistical Applications in Genetics and Molecular Biology
Lajmi Lakhal-Chaieb, Celia M T Greenwood, Mohamed Ouhourane, Kaiqiong Zhao, Belkacem Abdous, Karim Oualkacha
We consider the assessment of DNA methylation profiles for sequencing-derived data from a single cell type or from cell lines. We derive a kernel smoothed EM-algorithm, capable of analyzing an entire chromosome at once, and to simultaneously correct for experimental errors arising from either the pre-treatment steps or from the sequencing stage and to take into account spatial correlations between DNA methylation profiles at neighbouring CpG sites. The outcomes of our algorithm are then used to (i) call the true methylation status at each CpG site, (ii) provide accurate smoothed estimates of DNA methylation levels, and (iii) detect differentially methylated regions...
November 27, 2017: Statistical Applications in Genetics and Molecular Biology
Ingrid M Lönnstedt, Sven Nelander
The systematic study of transcriptional responses to genetic and chemical perturbations in human cells is still in its early stages. The largest available dataset to date is the newly released L1000 compendium. With its 1.3 million gene expression profiles of treated human cells it offers many opportunities for biomedical data mining, but also data normalization challenges of new dimensions. We developed a novel and practical approach to obtain accurate estimates of fold change response profiles from L1000, based on the RUV (Remove Unwanted Variation) statistical framework...
September 26, 2017: Statistical Applications in Genetics and Molecular Biology
Chiara Sacco, Cinzia Viroli, Mario Falchi
Genomic imprinting is an epigenetic mechanism that leads to differential contributions of maternal and paternal alleles to offspring gene expression in a parent-of-origin manner. We propose a novel test for detecting the parent-of-origin effects (POEs) in genome wide genotype data from related individuals (twins) when the parental origin cannot be inferred. The proposed method exploits a finite mixture of linear mixed models: the key idea is that in the case of POEs the population can be clustered in two different groups in which the reference allele is inherited by a different parent...
September 26, 2017: Statistical Applications in Genetics and Molecular Biology
Nasim Ejlali, Mohammad Reza Faghihi, Mehdi Sadeghi
An important topic in bioinformatics is the protein structure alignment. Some statistical methods have been proposed for this problem, but most of them align two protein structures based on the global geometric information without considering the effect of neighbourhood in the structures. In this paper, we provide a Bayesian model to align protein structures, by considering the effect of both local and global geometric information of protein structures. Local geometric information is incorporated to the model through the partial Procrustes distance of small substructures...
September 26, 2017: Statistical Applications in Genetics and Molecular Biology
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"