Read by QxMD icon Read

Statistical Applications in Genetics and Molecular Biology

Alexia Kakourou, Werner Vach, Simone Nicolardi, Yuri van der Burgt, Bart Mertens
Mass spectrometry based clinical proteomics has emerged as a powerful tool for high-throughput protein profiling and biomarker discovery. Recent improvements in mass spectrometry technology have boosted the potential of proteomic studies in biomedical research. However, the complexity of the proteomic expression introduces new statistical challenges in summarizing and analyzing the acquired data. Statistical methods for optimally processing proteomic data are currently a growing field of research. In this paper we present simple, yet appropriate methods to preprocess, summarize and analyze high-throughput MALDI-FTICR mass spectrometry data, collected in a case-control fashion, while dealing with the statistical challenges that accompany such data...
October 1, 2016: Statistical Applications in Genetics and Molecular Biology
Colin S Gillespie, Andrew Golightly
Solving the chemical master equation exactly is typically not possible, so instead we must rely on simulation based methods. Unfortunately, drawing exact realisations, results in simulating every reaction that occurs. This will preclude the use of exact simulators for models of any realistic size and so approximate algorithms become important. In this paper we describe a general framework for assessing the accuracy of the linear noise and two moment approximations. By constructing an efficient space filling design over the parameter region of interest, we present a number of useful diagnostic tools that aids modellers in assessing whether the approximation is suitable...
October 1, 2016: Statistical Applications in Genetics and Molecular Biology
Jochen Kruppa, Frank Kramer, Tim Beißbarth, Klaus Jung
As part of the data processing of high-throughput-sequencing experiments count data are produced representing the amount of reads that map to specific genomic regions. Count data also arise in mass spectrometric experiments for the detection of protein-protein interactions. For evaluating new computational methods for the analysis of sequencing count data or spectral count data from proteomics experiments artificial count data is thus required. Although, some methods for the generation of artificial sequencing count data have been proposed, all of them simulate single sequencing runs, omitting thus the correlation structure between the individual genomic features, or they are limited to specific structures...
October 1, 2016: Statistical Applications in Genetics and Molecular Biology
Valentina Pugacheva, Alexander Korotkov, Eugene Korotkov
The aim of this study was to show that amino acid sequences have a latent periodicity with insertions and deletions of amino acids in unknown positions of the analyzed sequence. Genetic algorithm, dynamic programming and random weight matrices were used to develop a new mathematical algorithm for latent periodicity search. A multiple alignment of periods was calculated with help of the direct optimization of the position-weight matrix without using pairwise alignments. The developed algorithm was applied to analyze amino acid sequences of a small number of proteins...
October 1, 2016: Statistical Applications in Genetics and Molecular Biology
Yulan Liang, Arpad Kelemen
Construction of gene-gene interaction networks and potential pathways is a challenging and important problem in genomic research for complex diseases while estimating the dynamic changes of the temporal correlations and non-stationarity are the keys in this process. In this paper, we develop dynamic state space models with hierarchical Bayesian settings to tackle this challenge for inferring the dynamic profiles and genetic networks associated with disease treatments. We treat both the stochastic transition matrix and the observation matrix time-variant and include temporal correlation structures in the covariance matrix estimations in the multivariate Bayesian state space models...
August 1, 2016: Statistical Applications in Genetics and Molecular Biology
Carsten Wiuf, Jonatan Schaumburg-Müller Pallesen, Leslie Foldager, Jakob Grove
In many areas of science it is custom to perform many, potentially millions, of tests simultaneously. To gain statistical power it is common to group tests based on a priori criteria such as predefined regions or by sliding windows. However, it is not straightforward to choose grouping criteria and the results might depend on the chosen criteria. Methods that summarize, or aggregate, test statistics or p-values, without relying on a priori criteria, are therefore desirable. We present a simple method to aggregate a sequence of stochastic variables, such as test statistics or p-values, into fewer variables without assuming a priori defined groups...
August 1, 2016: Statistical Applications in Genetics and Molecular Biology
Nolen Perualila-Tan, Adetayo Kasim, Willem Talloen, Bie Verbist, Hinrich W H Göhlmann, Ziv Shkedy
The modern drug discovery process involves multiple sources of high-dimensional data. This imposes the challenge of data integration. A typical example is the integration of chemical structure (fingerprint features), phenotypic bioactivity (bioassay read-outs) data for targets of interest, and transcriptomic (gene expression) data in early drug discovery to better understand the chemical and biological mechanisms of candidate drugs, and to facilitate early detection of safety issues prior to later and expensive phases of drug development cycles...
August 1, 2016: Statistical Applications in Genetics and Molecular Biology
Charles Laurin, Dorret Boomsma, Gitta Lubke
The Lasso is a shrinkage regression method that is widely used for variable selection in statistical genetics. Commonly, K-fold cross-validation is used to fit a Lasso model. This is sometimes followed by using bootstrap confidence intervals to improve precision in the resulting variable selections. Nesting cross-validation within bootstrapping could provide further improvements in precision, but this has not been investigated systematically. We performed simulation studies of Lasso variable selection precision (VSP) with and without nesting cross-validation within bootstrapping...
August 1, 2016: Statistical Applications in Genetics and Molecular Biology
Chamont Wang, Jana L Gevertz
Modern biological experiments often involve high-dimensional data with thousands or more variables. A challenging problem is to identify the key variables that are related to a specific disease. Confounding this task is the vast number of statistical methods available for variable selection. For this reason, we set out to develop a framework to investigate the variable selection capability of statistical methods that are commonly applied to analyze high-dimensional biological datasets. Specifically, we designed six simulated cancers (based on benchmark colon and prostate cancer data) where we know precisely which genes cause a dataset to be classified as cancerous or normal - we call these causative genes...
August 1, 2016: Statistical Applications in Genetics and Molecular Biology
Christopher D Steele, Matthew Greenhalgh, David J Balding
In recent years statistical models for the analysis of complex (low-template and/or mixed) DNA profiles have moved from using only presence/absence information about allelic peaks in an electropherogram, to quantitative use of peak heights. This is challenging because peak heights are very variable and affected by a number of factors. We present a new peak-height model with important novel features, including over- and double-stutter, and a new approach to dropin. Our model is incorporated in open-source R code likeLTD...
July 14, 2016: Statistical Applications in Genetics and Molecular Biology
Yuna Blum, Magalie Houée-Bigot, David Causeur
Inference on gene regulatory networks from high-throughput expression data turns out to be one of the main current challenges in systems biology. Such networks can be very insightful for the deep understanding of interactions between genes. Because genes-gene interactions is often viewed as joint contributions to known biological mechanisms, inference on the dependence among gene expressions is expected to be consistent to some extent with the functional characterization of genes which can be derived from ontologies (GO, KEGG, …)...
June 1, 2016: Statistical Applications in Genetics and Molecular Biology
Veronica Vinciotti, Luigi Augugliaro, Antonino Abbruzzo, Ernst C Wit
Factorial Gaussian graphical Models (fGGMs) have recently been proposed for inferring dynamic gene regulatory networks from genomic high-throughput data. In the search for true regulatory relationships amongst the vast space of possible networks, these models allow the imposition of certain restrictions on the dynamic nature of these relationships, such as Markov dependencies of low order - some entries of the precision matrix are a priori zeros - or equal dependency strengths across time lags - some entries of the precision matrix are assumed to be equal...
June 1, 2016: Statistical Applications in Genetics and Molecular Biology
Elizabeth Sweeney, Ciprian Crainiceanu, Jan Gertheiss
When testing for differentially expressed genes between more than two groups, the groups are often defined by dose levels in dose-response experiments or ordinal phenotypes, such as disease stages. We discuss the potential of a new approach that uses the levels' ordering without making any structural assumptions, such as monotonicity, by testing for zero variance components in a mixed models framework. Since the mixed effects model approach borrows strength across doses/levels, the test proposed can also be applied when the number of dose levels/phenotypes is large and/or the number of subjects per group is small...
June 1, 2016: Statistical Applications in Genetics and Molecular Biology
Duchwan Ryu, Hongyan Xu, Varghese George, Shaoyong Su, Xiaoling Wang, Huidong Shi, Robert H Podolsky
Differential methylation of regulatory elements is critical in epigenetic researches and can be statistically tested. We developed a new statistical test, the generalized integrated functional test (GIFT), that tests for regional differences in methylation based on the methylation percent at each CpG site within a genomic region. The GIFT uses estimated subject-specific profiles with smoothing methods, specifically wavelet smoothing, and calculates an ANOVA-like test to compare the average profile of groups...
June 1, 2016: Statistical Applications in Genetics and Molecular Biology
Justina Žurauskienė, Paul D W Kirk, Michael P H Stumpf
The rapid development of high throughput experimental techniques has resulted in a growing diversity of genomic datasets being produced and requiring analysis. Therefore, it is increasingly being recognized that we can gain deeper understanding about underlying biology by combining the insights obtained from multiple, diverse datasets. Thus we propose a novel scalable computational approach to unsupervised data fusion. Our technique exploits network representations of the data to identify similarities among the datasets...
April 2016: Statistical Applications in Genetics and Molecular Biology
Jacob Coleman, Joseph Replogle, Gabriel Chandler, Johanna Hardin
Canonical correlation analysis (CCA) is a multivariate technique that takes two datasets and forms the most highly correlated possible pairs of linear combinations between them. Each subsequent pair of linear combinations is orthogonal to the preceding pair, meaning that new information is gleaned from each pair. By looking at the magnitude of coefficient values, we can find out which variables can be grouped together, thus better understanding multiple interactions that are otherwise difficult to compute or grasp intuitively...
April 2016: Statistical Applications in Genetics and Molecular Biology
Zhixiang Lin, Mingfeng Li, Nenad Sestan, Hongyu Zhao
The statistical methodology developed in this study was motivated by our interest in studying neurodevelopment using the mouse brain RNA-Seq data set, where gene expression levels were measured in multiple layers in the somatosensory cortex across time in both female and male samples. We aim to identify differentially expressed genes between adjacent time points, which may provide insights on the dynamics of brain development. Because of the extremely small sample size (one male and female at each time point), simple marginal analysis may be underpowered...
April 2016: Statistical Applications in Genetics and Molecular Biology
Shiqi Cui, Tieming Ji, Jilong Li, Jianlin Cheng, Jing Qiu
Identifying differentially expressed (DE) genes between different conditions is one of the main goals of RNA-seq data analysis. Although a large amount of RNA-seq data were produced for two-group comparison with small sample sizes at early stage, more and more RNA-seq data are being produced in the setting of complex experimental designs such as split-plot designs and repeated measure designs. Data arising from such experiments are traditionally analyzed by mixed-effects models. Therefore an appropriate statistical approach for analyzing RNA-seq data from such designs should be generalized linear mixed models (GLMM) or similar approaches that allow for random effects...
April 2016: Statistical Applications in Genetics and Molecular Biology
Mathieu Emily
Among the large of number of statistical methods that have been proposed to identify gene-gene interactions in case-control genome-wide association studies (GWAS), gene-based methods have recently grown in popularity as they confer advantage in both statistical power and biological interpretation. All of the gene-based methods jointly model the distribution of single nucleotide polymorphisms (SNPs) sets prior to the statistical test, leading to a limited power to detect sums of SNP-SNP signals. In this paper, we instead propose a gene-based method that first performs SNP-SNP interaction tests before aggregating the obtained p-values into a test at the gene level...
April 2016: Statistical Applications in Genetics and Molecular Biology
Xiaoqing Yu, Shuying Sun
We are presenting a comprehensive comparative analysis of five differential methylation (DM) identification methods: methylKit, BSmooth, BiSeq, HMM-DM, and HMM-Fisher, which are developed for bisulfite sequencing (BS) data. We summarize the features of these methods from several analytical aspects and compare their performances using both simulated and real BS datasets. Our comparison results are summarized below. First, parameter settings may largely affect the accuracy of DM identification. Different from default settings, modified parameter settings yield higher sensitivity and/or lower false positive rates...
April 2016: Statistical Applications in Genetics and Molecular Biology
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"