journal
MENU ▼
Read by QxMD icon Read
search

Statistical Applications in Genetics and Molecular Biology

journal
https://www.readbyqxmd.com/read/27875324/adaptive-input-data-transformation-for-improved-network-reconstruction-with-information-theoretic-algorithms
#1
Venkateshan Kannan, Jesper Tegner
We propose a novel systematic procedure of non-linear data transformation for an adaptive algorithm in the context of network reverse-engineering using information theoretic methods. Our methodology is rooted in elucidating and correcting for the specific biases in the estimation techniques for mutual information (MI) given a finite sample of data. These are, in turn, tied to lack of well-defined bounds for numerical estimation of MI for continuous probability distributions from finite data. The nature and properties of the inevitable bias is described, complemented by several examples illustrating their form and variation...
December 1, 2016: Statistical Applications in Genetics and Molecular Biology
https://www.readbyqxmd.com/read/27875323/estimating-intrinsic-and-extrinsic-noise-from-single-cell-gene-expression-measurements
#2
Audrey Qiuyan Fu, Lior Pachter
Gene expression is stochastic and displays variation ("noise") both within and between cells. Intracellular (intrinsic) variance can be distinguished from extracellular (extrinsic) variance by applying the law of total variance to data from two-reporter assays that probe expression of identically regulated gene pairs in single cells. We examine established formulas [Elowitz, M. B., A. J. Levine, E. D. Siggia and P. S. Swain (2002): "Stochastic gene expression in a single cell," Science, 297, 1183-1186.] for the estimation of intrinsic and extrinsic noise and provide interpretations of them in terms of a hierarchical model...
December 1, 2016: Statistical Applications in Genetics and Molecular Biology
https://www.readbyqxmd.com/read/27875322/tree-based-quantitative-trait-mapping-in-the-presence-of-external-covariates
#3
Katherine L Thompson, Catherine R Linnen, Laura Kubatko
A central goal in biological and biomedical sciences is to identify the molecular basis of variation in morphological and behavioral traits. Over the last decade, improvements in sequencing technologies coupled with the active development of association mapping methods have made it possible to link single nucleotide polymorphisms (SNPs) and quantitative traits. However, a major limitation of existing methods is that they are often unable to consider complex, but biologically-realistic, scenarios. Previous work showed that association mapping method performance can be improved by using the evolutionary history within each SNP to estimate the covariance structure among randomly-sampled individuals...
December 1, 2016: Statistical Applications in Genetics and Molecular Biology
https://www.readbyqxmd.com/read/27866174/sample-size-calculation-based-on-generalized-linear-models-for-differential-expression-analysis-in-rna-seq-data
#4
Chung-I Li, Yu Shyr
As RNA-seq rapidly develops and costs continually decrease, the quantity and frequency of samples being sequenced will grow exponentially. With proteomic investigations becoming more multivariate and quantitative, determining a study's optimal sample size is now a vital step in experimental design. Current methods for calculating a study's required sample size are mostly based on the hypothesis testing framework, which assumes each gene count can be modeled through Poisson or negative binomial distributions; however, these methods are limited when it comes to accommodating covariates...
December 1, 2016: Statistical Applications in Genetics and Molecular Biology
https://www.readbyqxmd.com/read/27682715/accounting-for-isotopic-clustering-in-fourier-transform-mass-spectrometry-data-analysis-for-clinical-diagnostic-studies
#5
Alexia Kakourou, Werner Vach, Simone Nicolardi, Yuri van der Burgt, Bart Mertens
Mass spectrometry based clinical proteomics has emerged as a powerful tool for high-throughput protein profiling and biomarker discovery. Recent improvements in mass spectrometry technology have boosted the potential of proteomic studies in biomedical research. However, the complexity of the proteomic expression introduces new statistical challenges in summarizing and analyzing the acquired data. Statistical methods for optimally processing proteomic data are currently a growing field of research. In this paper we present simple, yet appropriate methods to preprocess, summarize and analyze high-throughput MALDI-FTICR mass spectrometry data, collected in a case-control fashion, while dealing with the statistical challenges that accompany such data...
October 1, 2016: Statistical Applications in Genetics and Molecular Biology
https://www.readbyqxmd.com/read/27682714/diagnostics-for-assessing-the-linear-noise-and-moment-closure-approximations
#6
Colin S Gillespie, Andrew Golightly
Solving the chemical master equation exactly is typically not possible, so instead we must rely on simulation based methods. Unfortunately, drawing exact realisations, results in simulating every reaction that occurs. This will preclude the use of exact simulators for models of any realistic size and so approximate algorithms become important. In this paper we describe a general framework for assessing the accuracy of the linear noise and two moment approximations. By constructing an efficient space filling design over the parameter region of interest, we present a number of useful diagnostic tools that aids modellers in assessing whether the approximation is suitable...
October 1, 2016: Statistical Applications in Genetics and Molecular Biology
https://www.readbyqxmd.com/read/27655448/a-simulation-framework-for-correlated-count-data-of-features-subsets-in-high-throughput-sequencing-or-proteomics-experiments
#7
Jochen Kruppa, Frank Kramer, Tim Beißbarth, Klaus Jung
As part of the data processing of high-throughput-sequencing experiments count data are produced representing the amount of reads that map to specific genomic regions. Count data also arise in mass spectrometric experiments for the detection of protein-protein interactions. For evaluating new computational methods for the analysis of sequencing count data or spectral count data from proteomics experiments artificial count data is thus required. Although, some methods for the generation of artificial sequencing count data have been proposed, all of them simulate single sequencing runs, omitting thus the correlation structure between the individual genomic features, or they are limited to specific structures...
October 1, 2016: Statistical Applications in Genetics and Molecular Biology
https://www.readbyqxmd.com/read/27337743/search-of-latent-periodicity-in-amino-acid-sequences-by-means-of-genetic-algorithm-and-dynamic-programming
#8
Valentina Pugacheva, Alexander Korotkov, Eugene Korotkov
The aim of this study was to show that amino acid sequences have a latent periodicity with insertions and deletions of amino acids in unknown positions of the analyzed sequence. Genetic algorithm, dynamic programming and random weight matrices were used to develop a new mathematical algorithm for latent periodicity search. A multiple alignment of periods was calculated with help of the direct optimization of the position-weight matrix without using pairwise alignments. The developed algorithm was applied to analyze amino acid sequences of a small number of proteins...
October 1, 2016: Statistical Applications in Genetics and Molecular Biology
https://www.readbyqxmd.com/read/27343475/bayesian-state-space-models-for-dynamic-genetic-network-construction-across-multiple-tissues
#9
Yulan Liang, Arpad Kelemen
Construction of gene-gene interaction networks and potential pathways is a challenging and important problem in genomic research for complex diseases while estimating the dynamic changes of the temporal correlations and non-stationarity are the keys in this process. In this paper, we develop dynamic state space models with hierarchical Bayesian settings to tackle this challenge for inferring the dynamic profiles and genetic networks associated with disease treatments. We treat both the stochastic transition matrix and the observation matrix time-variant and include temporal correlation structures in the covariance matrix estimations in the multivariate Bayesian state space models...
August 1, 2016: Statistical Applications in Genetics and Molecular Biology
https://www.readbyqxmd.com/read/27269897/landscape-a-simple-method-to-aggregate-p-values-and-other-stochastic-variables-without-a-priori-grouping
#10
Carsten Wiuf, Jonatan Schaumburg-Müller Pallesen, Leslie Foldager, Jakob Grove
In many areas of science it is custom to perform many, potentially millions, of tests simultaneously. To gain statistical power it is common to group tests based on a priori criteria such as predefined regions or by sliding windows. However, it is not straightforward to choose grouping criteria and the results might depend on the chosen criteria. Methods that summarize, or aggregate, test statistics or p-values, without relying on a priori criteria, are therefore desirable. We present a simple method to aggregate a sequence of stochastic variables, such as test statistics or p-values, into fewer variables without assuming a priori defined groups...
August 1, 2016: Statistical Applications in Genetics and Molecular Biology
https://www.readbyqxmd.com/read/27269248/a-joint-modeling-approach-for-uncovering-associations-between-gene-expression-bioactivity-and-chemical-structure-in-early-drug-discovery-to-guide-lead-selection-and-genomic-biomarker-development
#11
Nolen Perualila-Tan, Adetayo Kasim, Willem Talloen, Bie Verbist, Hinrich W H Göhlmann, Ziv Shkedy
The modern drug discovery process involves multiple sources of high-dimensional data. This imposes the challenge of data integration. A typical example is the integration of chemical structure (fingerprint features), phenotypic bioactivity (bioassay read-outs) data for targets of interest, and transcriptomic (gene expression) data in early drug discovery to better understand the chemical and biological mechanisms of candidate drugs, and to facilitate early detection of safety issues prior to later and expensive phases of drug development cycles...
August 1, 2016: Statistical Applications in Genetics and Molecular Biology
https://www.readbyqxmd.com/read/27248122/the-use-of-vector-bootstrapping-to-improve-variable-selection-precision-in-lasso-models
#12
Charles Laurin, Dorret Boomsma, Gitta Lubke
The Lasso is a shrinkage regression method that is widely used for variable selection in statistical genetics. Commonly, K-fold cross-validation is used to fit a Lasso model. This is sometimes followed by using bootstrap confidence intervals to improve precision in the resulting variable selections. Nesting cross-validation within bootstrapping could provide further improvements in precision, but this has not been investigated systematically. We performed simulation studies of Lasso variable selection precision (VSP) with and without nesting cross-validation within bootstrapping...
August 1, 2016: Statistical Applications in Genetics and Molecular Biology
https://www.readbyqxmd.com/read/27226102/finding-causative-genes-from-high-dimensional-data-an-appraisal-of-statistical-and-machine-learning-approaches
#13
Chamont Wang, Jana L Gevertz
Modern biological experiments often involve high-dimensional data with thousands or more variables. A challenging problem is to identify the key variables that are related to a specific disease. Confounding this task is the vast number of statistical methods available for variable selection. For this reason, we set out to develop a framework to investigate the variable selection capability of statistical methods that are commonly applied to analyze high-dimensional biological datasets. Specifically, we designed six simulated cancers (based on benchmark colon and prostate cancer data) where we know precisely which genes cause a dataset to be classified as cancerous or normal - we call these causative genes...
August 1, 2016: Statistical Applications in Genetics and Molecular Biology
https://www.readbyqxmd.com/read/27416618/evaluation-of-low-template-dna-profiles-using-peak-heights
#14
Christopher D Steele, Matthew Greenhalgh, David J Balding
In recent years statistical models for the analysis of complex (low-template and/or mixed) DNA profiles have moved from using only presence/absence information about allelic peaks in an electropherogram, to quantitative use of peak heights. This is challenging because peak heights are very variable and affected by a number of factors. We present a new peak-height model with important novel features, including over- and double-stutter, and a new approach to dropin. Our model is incorporated in open-source R code likeLTD...
July 14, 2016: Statistical Applications in Genetics and Molecular Biology
https://www.readbyqxmd.com/read/27166726/sparse-factor-model-for-co-expression-networks-with-an-application-using-prior-biological-knowledge
#15
Yuna Blum, Magalie Houée-Bigot, David Causeur
Inference on gene regulatory networks from high-throughput expression data turns out to be one of the main current challenges in systems biology. Such networks can be very insightful for the deep understanding of interactions between genes. Because genes-gene interactions is often viewed as joint contributions to known biological mechanisms, inference on the dependence among gene expressions is expected to be consistent to some extent with the functional characterization of genes which can be derived from ontologies (GO, KEGG, …)...
June 1, 2016: Statistical Applications in Genetics and Molecular Biology
https://www.readbyqxmd.com/read/27023322/model-selection-for-factorial-gaussian-graphical-models-with-an-application-to-dynamic-regulatory-networks
#16
Veronica Vinciotti, Luigi Augugliaro, Antonino Abbruzzo, Ernst C Wit
Factorial Gaussian graphical Models (fGGMs) have recently been proposed for inferring dynamic gene regulatory networks from genomic high-throughput data. In the search for true regulatory relationships amongst the vast space of possible networks, these models allow the imposition of certain restrictions on the dynamic nature of these relationships, such as Markov dependencies of low order - some entries of the precision matrix are a priori zeros - or equal dependency strengths across time lags - some entries of the precision matrix are assumed to be equal...
June 1, 2016: Statistical Applications in Genetics and Molecular Biology
https://www.readbyqxmd.com/read/26992202/testing-differentially-expressed-genes-in-dose-response-studies-and-with-ordinal-phenotypes
#17
Elizabeth Sweeney, Ciprian Crainiceanu, Jan Gertheiss
When testing for differentially expressed genes between more than two groups, the groups are often defined by dose levels in dose-response experiments or ordinal phenotypes, such as disease stages. We discuss the potential of a new approach that uses the levels' ordering without making any structural assumptions, such as monotonicity, by testing for zero variance components in a mixed models framework. Since the mixed effects model approach borrows strength across doses/levels, the test proposed can also be applied when the number of dose levels/phenotypes is large and/or the number of subjects per group is small...
June 1, 2016: Statistical Applications in Genetics and Molecular Biology
https://www.readbyqxmd.com/read/26982617/differential-methylation-tests-of-regulatory-regions
#18
Duchwan Ryu, Hongyan Xu, Varghese George, Shaoyong Su, Xiaoling Wang, Huidong Shi, Robert H Podolsky
Differential methylation of regulatory elements is critical in epigenetic researches and can be statistically tested. We developed a new statistical test, the generalized integrated functional test (GIFT), that tests for regional differences in methylation based on the methylation percent at each CpG site within a genomic region. The GIFT uses estimated subject-specific profiles with smoothing methods, specifically wavelet smoothing, and calculates an ANOVA-like test to compare the average profile of groups...
June 1, 2016: Statistical Applications in Genetics and Molecular Biology
https://www.readbyqxmd.com/read/26992203/a-graph-theoretical-approach-to-data-fusion
#19
Justina Žurauskienė, Paul D W Kirk, Michael P H Stumpf
The rapid development of high throughput experimental techniques has resulted in a growing diversity of genomic datasets being produced and requiring analysis. Therefore, it is increasingly being recognized that we can gain deeper understanding about underlying biology by combining the insights obtained from multiple, diverse datasets. Thus we propose a novel scalable computational approach to unsupervised data fusion. Our technique exploits network representations of the data to identify similarities among the datasets...
April 2016: Statistical Applications in Genetics and Molecular Biology
https://www.readbyqxmd.com/read/26963062/resistant-multiple-sparse-canonical-correlation
#20
Jacob Coleman, Joseph Replogle, Gabriel Chandler, Johanna Hardin
Canonical correlation analysis (CCA) is a multivariate technique that takes two datasets and forms the most highly correlated possible pairs of linear combinations between them. Each subsequent pair of linear combinations is orthogonal to the preceding pair, meaning that new information is gleaned from each pair. By looking at the magnitude of coefficient values, we can find out which variables can be grouped together, thus better understanding multiple interactions that are otherwise difficult to compute or grasp intuitively...
April 2016: Statistical Applications in Genetics and Molecular Biology
journal
journal
40440
1
2
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read
×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"