Adam McDermaid, Brandon Monier, Jing Zhao, Bingqiang Liu, Qin Ma
Differential gene expression (DGE) analysis is one of the most common applications of RNA-sequencing (RNA-seq) data. This process allows for the elucidation of differentially expressed genes across two or more conditions and is widely used in many applications of RNA-seq data analysis. Interpretation of the DGE results can be nonintuitive and time consuming due to the variety of formats based on the tool of choice and the numerous pieces of information provided in these results files. Here we reviewed DGE results analysis from a functional point of view for various visualizations...
August 6, 2018: Briefings in Bioinformatics
Luyi Tian, Shian Su, Xueyi Dong, Daniela Amann-Zalcenstein, Christine Biben, Azadeh Seidi, Douglas J Hilton, Shalin H Naik, Matthew E Ritchie
Single-cell RNA sequencing (scRNA-seq) technology allows researchers to profile the transcriptomes of thousands of cells simultaneously. Protocols that incorporate both designed and random barcodes have greatly increased the throughput of scRNA-seq, but give rise to a more complex data structure. There is a need for new tools that can handle the various barcoding strategies used by different protocols and exploit this information for quality assessment at the sample-level and provide effective visualization of these results in preparation for higher-level analyses...
August 10, 2018: PLoS Computational Biology
Adithya Murali, Aniruddha Bhargava, Erik S Wright
BACKGROUND: Microbiome studies often involve sequencing a marker gene to identify the microorganisms in samples of interest. Sequence classification is a critical component of this process, whereby sequences are assigned to a reference taxonomy containing known sequence representatives of many microbial groups. Previous studies have shown that existing classification programs often assign sequences to reference groups even if they belong to novel taxonomic groups that are absent from the reference taxonomy...
August 9, 2018: Microbiome
Jack M Fu, Elizabeth J Leslie, Alan F Scott, Jeffrey C Murray, Mary L Marazita, Terri H Beaty, Robert B Scharpf, Ingo Ruczinski
Motivation: De novo copy number deletions have been implicated in many diseases, but there is no formal method to date that identifies de novo deletions in parent-offspring trios from capture-based sequencing platforms. Results: We developed Minimum Distance for Targeted Sequencing (MDTS) to fill this void. MDTS has similar sensitivity (recall), but a much lower false positive rate compared to less specific CNV callers, resulting in amuch higher positive predictive value (precision)...
August 2, 2018: Bioinformatics
Xinyuan Zhao, Minlu Liang, Xiaona Li, Xiaoling Qiu, Li Cui
Adipose stem cells (ASCs) are considered a great alternative source of mesenchymal stem cells (MSCs) and have shown great promise on tissue engineering and regenerative medicine applications, including bone repair. However, the underlying mechanisms regulating the osteogenic differentiation of ASCs remain poorly known. Gene expression profiles of GSE63754 and GSE37329 were downloaded from gene expression omnibus database. R software and Bioconductor packages were used to compare and identify the differentially expressed genes (DEGs) before and after ASC osteogenic differentiation...
August 5, 2018: Journal of Cellular Physiology
Zicheng Hu, Chethan Jujjavarapu, Jacob J Hughey, Sandra Andorf, Hao-Chih Lee, Pier Federico Gherardini, Matthew H Spitzer, Cristel G Thomas, John Campbell, Patrick Dunn, Jeff Wiser, Brian A Kidd, Joel T Dudley, Garry P Nolan, Sanchita Bhattacharya, Atul J Butte
While meta-analysis has demonstrated increased statistical power and more robust estimations in studies, the application of this commonly accepted methodology to cytometry data has been challenging. Different cytometry studies often involve diverse sets of markers. Moreover, the detected values of the same marker are inconsistent between studies due to different experimental designs and cytometer configurations. As a result, the cell subsets identified by existing auto-gating methods cannot be directly compared across studies...
July 31, 2018: Cell Reports
John C Stansfield, Kellen G Cresswell, Vladimir I Vladimirov, Mikhail G Dozmorov
BACKGROUND: Changes in spatial chromatin interactions are now emerging as a unifying mechanism orchestrating the regulation of gene expression. Hi-C sequencing technology allows insight into chromatin interactions on a genome-wide scale. However, Hi-C data contains many DNA sequence- and technology-driven biases. These biases prevent effective comparison of chromatin interactions aimed at identifying genomic regions differentially interacting between, e.g., disease-normal states or different cell types...
July 31, 2018: BMC Bioinformatics
Andrea Rodriguez-Martinez, Rafael Ayala, Joram M Posma, Marc-Emmanuel Dumas
MetaboSignal is an R/Bioconductor package designed to explore the relationships between genes and metabolites, using the Kyoto Encyclopedia of Genes and Genomes (KEGG) as its primary database. It is a network-based approach that allows overlaying metabolic and signaling pathways and exploring the topological relationship between genes (signaling or metabolic genes) and metabolites. MetaboSignal is ideally suited to identify candidate genes in metabolome genome-wide association studies (mGWAS), particularly in the case of trans-acting associations...
March 2018: Current Protocols in Bioinformatics
Crystina L Kriss, Emily Gregory-Lott, Aaron J Storey, Alan J Tackett, Wayne P Wahls, Stanley M Stevens
BACKGROUND: Epigenetic dysregulation through ethanol-induced changes in DNA methylation and histone modifications has been implicated in several alcohol-related disorders such as alcoholic liver disease (ALD). Ethanol metabolism in the liver results in the formation of acetate, a metabolite that can be converted to acetyl-CoA, which can then be used by histone acetyltransferases to acetylate lysine residues. Ethanol metabolism in the liver can also indirectly influence lysine acetylation through NAD+ -dependent sirtuin activity that is altered due to increases in NADH as a result of ethanol metabolism...
July 21, 2018: Alcoholism, Clinical and Experimental Research
Alberto Valdeolivas, Laurent Tichit, Claire Navarro, Sophie Perrin, Gaëlle Odelin, Nicolas Levy, Pierre Cau, Elisabeth Remy, Anaïs Baudot
Motivation: Recentyears have witnessed anexponentialgrowthin thenumberof identified interactions between biological molecules. These interactions are usually represented as large and complex networks, callingforthedevelopmentof appropriated toolstoexploitthe functionalinformationtheycontain. Random walk with restart is the state-of-the-art guilt-by-association approach. It explores the network vicinity of gene/protein seeds to study their functions, based on the premise that nodes related to similar functions tend to lie close to each other in the networks...
July 18, 2018: Bioinformatics
Patrick K Kimes, Alejandro Reyes
Summary: Benchmark studies are widely used to compare and evaluate tools developed for answering various biological questions. Despite the popularity of these comparisons, the implementation is often ad hoc, with little consistency across studies. To address this problem, we developed SummarizedBenchmark, an R package and framework for organizing and structuring benchmark comparisons. SummarizedBenchmark defines a general grammar for benchmarking and allows for easier setup and execution of benchmark comparisons, while improving the reproducibility and replicability of such comparisons...
July 17, 2018: Bioinformatics
Christian X Weichenberger, Johannes Rainer, Cristian Pattaro, Peter P Pramstaller, Francisco S Domingues
Motivation: Familial aggregation analysis is an important early step for characterizing the genetic determinants of phenotypes in epidemiological studies. To facilitate this analysis, a collection of methods to detect familial aggregation in large pedigrees has been made available recently. However, efficacy of these methods in real world scenarios remains largely unknown. Here, we assess the performance of five aggregation methods to identify individuals or groups of related individuals affected by a Mendelian trait within a large set of decoys...
July 13, 2018: Bioinformatics
Kevin Rue-Albrecht, Federico Marini, Charlotte Soneson, Aaron T L Lun
Data exploration is critical to the comprehension of large biological data sets generated by high-throughput assays such as sequencing. However, most existing tools for interactive visualisation are limited to specific assays or analyses. Here, we present the iSEE (Interactive SummarizedExperiment Explorer) software package, which provides a general visual interface for exploring data in a SummarizedExperiment object. iSEE is directly compatible with many existing R/Bioconductor packages for analysing high-throughput biological data, and provides useful features such as simultaneous examination of (meta)data and analysis results, dynamic linking between plots and code tracking for reproducibility...
2018: F1000Research
Marco Catoni, Jonathan Mf Tsang, Alessandro P Greco, Nicolae Radu Zabet
DNA methylation has been associated with transcriptional repression and detection of differential methylation is important in understanding the underlying causes of differential gene expression. Bisulfite-converted genomic DNA sequencing is the current gold standard in the field for building genome-wide maps at a base pair resolution of DNA methylation. Here we systematically investigate the underlying features of detecting differential DNA methylation in CpG and non-CpG contexts, considering both the case of mammalian systems and plants...
July 9, 2018: Nucleic Acids Research
Ram Krishna Shrestha, Pingtao Ding, Jonathan D G Jones, Dan MacLean
Background: Assay for Transposase-Accessible Chromatin (ATAC)-cap-seq is a high-throughput sequencing method that combines ATAC-seq with targeted nucleic acid enrichment of precipitated DNA fragments. There are increased analytical difficulties arising from working with a set of regions of interest that may be small in number and biologically dependent. Common statistical pipelines for RNA sequencing might be assumed to apply but can give misleading results on ATAC-cap-seq data. A tool is needed to allow a nonspecialist user to quickly and easily summarize data and apply sensible and effective normalization and analysis...
July 1, 2018: GigaScience
Georg Stricker, Mathilde Galinier, Julien Gagneur
BACKGROUND: GenoGAM (Genome-wide generalized additive models) is a powerful statistical modeling tool for the analysis of ChIP-Seq data with flexible factorial design experiments. However large runtime and memory requirements of its current implementation prohibit its application to gigabase-scale genomes such as mammalian genomes. RESULTS: Here we present GenoGAM 2.0, a scalable and efficient implementation that is 2 to 3 orders of magnitude faster than the previous version...
June 27, 2018: BMC Bioinformatics
Patryk Orzechowski, Artur Panszczyk, Xiuzhen Huang, Jason H Moore
Motivation: Biclustering is an unsupervised technique of simultaneous clustering of rows and columns of input matrix. With multiple biclustering algorithms proposed, UniBic remains one of the most accurate methods developed so far. Results: In this paper we introduce a Bioconductor package called runibic with parallel implementation of UniBic. For the convenience the algorithm was reimplemented, parallelized, and wrapped within an R package called runibic. The package includes: (1) a couple of times faster parallel version of the original sequential algorithm, (2) much more efficient memory management, (3) modularity which allows to build new methods on top of the provided one, and (4) integration with the modern Bioconductor packages such as SummarizedExperiment, ExpressionSet and biclust...
June 23, 2018: Bioinformatics
Laurent Jacob, Florence Combes, Thomas Burger
We propose a new hypothesis test for the differential abundance of proteins in mass-spectrometry based relative quantification. An important feature of this type of high-throughput analyses is that it involves an enzymatic digestion of the sample proteins into peptides prior to identification and quantification. Due to numerous homology sequences, different proteins can lead to peptides with identical amino acid chains, so that their parent protein is ambiguous. These so-called shared peptides make the protein-level statistical analysis a challenge and are often not accounted for...
June 18, 2018: Biostatistics
Gustavo H Esteves, Luiz F L Reis
MOTIVATION: Gene expression data analysis is of great importance for modern molecular biology, given our ability to measure the expression profiles of thousands of genes and enabling studies rooted in systems biology. In this work, we propose a simple statistical model for the activation measuring of gene regulatory networks, instead of the traditional gene co-expression networks. RESULTS: We present the mathematical construction of a statistical procedure for testing hypothesis regarding gene regulatory network activation...
June 13, 2018: Statistical Applications in Genetics and Molecular Biology
Aaron Taudt, David Roquis, Amaryllis Vidalis, René Wardenaar, Frank Johannes, Maria Colome-Tatché-Tatché
BACKGROUND: Whole-genome bisulfite sequencing (WGBS) has become the standard method for interrogating plant methylomes at base resolution. However, deep WGBS measurements remain cost prohibitive for large, complex genomes and for population-level studies. As a result, most published plant methylomes are sequenced far below saturation, with a large proportion of cytosines having either missing data or insufficient coverage. RESULTS: Here we present METHimpute, a Hidden Markov Model (HMM) based imputation algorithm for the analysis of WGBS data...
June 7, 2018: BMC Genomics
