Diogo Almeida, Ida Skov, Artur Silva, Fabio Vandin, Qihua Tan, Richard Röttger, Jan Baumbach
MOTIVATION: Epigenome-wide association studies (EWAS) generate big epidemiological data sets. They aim for detecting differentially methylated DNA regions that are likely to influence transcriptional gene activity and, thus, the regulation of metabolic processes. The by far most widely used technology is the Illumina Methylation BeadChip, which measures the methylation levels of 450 (850) thousand cytosines, in the CpG dinucleotide context in a set of patients compared to a control group...
October 29, 2016: Bioinformatics
Blake L Joyce, Asher Haug-Baltzell, Sean Davey, Matthew Bomhoff, James C Schnable, Eric Lyons
: Following polyploidy events, genomes undergo massive reduction in gene content through a process known as fractionation. Importantly, the fractionation process is not always random, and a bias as to which homeologous chromosome retains or loses more genes can be observed in some species. The process of characterizing whole genome fractionation requires identifying syntenic regions across genomes followed by post-processing of those syntenic datasets to identify and plot gene retention patterns...
October 29, 2016: Bioinformatics
Gabriel Renaud, Kristian Hanghøj, Eske Willeslev, Ludovic Orlando
: Ancient DNA has emerged as a remarkable tool to infer the history of extinct species and past populations. However, many of its characteristics, such as extensive fragmentation, damage and contamination, can influence downstream analyses. To help investigators measure how these could impact their analyses in silico, we have developed gargammel, a package that simulates ancient DNA fragments given a set of known reference genomes. Our package simulates the entire molecular process from post-mortem DNA fragmentation and DNA damage to experimental sequencing errors, and reproduces most common bias observed in ancient DNA datasets...
October 29, 2016: Bioinformatics
Martin S Lindner, Benjamin Strauch, Jakob M Schulze, Simon Tausch, Piotr W Dabrowski, Andreas Nitsche, Bernhard Y Renard
MOTIVATION: Next Generation Sequencing is increasingly used in time critical, clinical applications. While read mapping algorithms have always been optimized for speed, they follow a sequential paradigm and only start after finishing of the sequencing run and conversion of files. Since Illumina machines write intermediate output results, HiLive performs read mapping while still sequencing and thereby drastically reduces crucial overall sample analysis time, e.g. in precision medicine...
October 29, 2016: Bioinformatics
Tolutola Oyetunde, Muhan Zhang, Yixin Chen, Yinjie Tang, Cynthia Lo
MOTIVATION: Metabolic network reconstructions are often incomplete. Constraint-based and pattern-based methodologies have been used for automated gap filling of these networks, each with its own strengths and weaknesses. Moreover, since validation of hypotheses made by gap filling tools require experimentation, it is challenging to benchmark performance and make improvements other than that related to speed and scalability. RESULTS: We present BoostGAPFILL, an open source tool that leverages both constraint-based and machine learning methodologies for hypotheses generation in gap filling and metabolic model refinement...
October 26, 2016: Bioinformatics
Hannes Klarner, Adam Streck, Heike Siebert
MOTIVATION: The goal of this project is to provide a simple interface to working with Boolean networks. Emphasis is put on easy access to a large number of common tasks including the generation and manipulation of networks, attractor and basin computation, model checking and trap space computation, execution of established graph algorithms as well as graph drawing and layouts. RESULTS: PyBoolNet is a Python package for working with Boolean networks that supports simple access to model checking via NuSMV, standard graph algorithms via NetworkX and visualisation via dot...
October 26, 2016: Bioinformatics
Mark C Hiner, Curtis T Rueden, Kevin W Eliceiri
: ImageJ-MATLAB is a lightweight Java library facilitating bi-directional interoperability between MATLAB and ImageJ. By defining a standard for translation between matrix and image data structures, researchers are empowered to select the best tool for their image-analysis tasks. AVAILABILITY: Freely available extension to ImageJ2 ( Installation and use instructions available at Tested with ImageJ 2...
October 26, 2016: Bioinformatics
Dayne L Filer, Parth Kothiya, R Woodrow Setzer, Richard S Judson, Matthew T Martin
MOTIVATION: Large high-throughput screening (HTS) efforts are widely used in drug development and chemical toxicity screening. Wide use and integration of these data can benefit from an efficient, transparent, and reproducible data pipeline. SUMMARY: The tcpl R package and its associated MySQL database provide a generalized platform for efficiently storing, normalizing, and dose-response modeling of large high-throughput and high-content chemical screening data...
October 26, 2016: Bioinformatics
Nick A Watts, Frank A Feltus
MOTIVATION: The ability to centralize and store data for long periods on an end user's computational resources is increasingly difficult for many scientific disciplines. For example, genomics data is increasingly large and distributed, and the data needs to be moved into workflow execution sites ranging from lab workstations to the cloud. However, the typical user is not always informed on emerging network technology or the most efficient methods to move and share data. Thus, the user defaults to using inefficient methods for transfer across the commercial internet...
October 26, 2016: Bioinformatics
Carlos García-Pérez, Rafael Peláez, Roberto Therón, José Luis López-Pérez
MOTIVATION: AutoDock is a very popular software package for docking and virtual screening. However, currently it is hard work to visualize more than one result from the virtual screening at a time. To overcome this limitation we have designed JADOPPT, a tool for automatically preparing and processing multiple ligand-protein docked poses obtained from AutoDock. It allows the simultaneous visual assessment and comparison of multiple poses through clustering methods. Moreover, it permits the representation of reference ligands with known binding modes, binding site residues, highly scoring regions for the ligand, and the calculated binding energy of the best ranked results...
October 26, 2016: Bioinformatics
Frederik Gwinner, Gwénola Boulday, Claire Vandiedonck, Minh Arnould, Cécile Cardoso, Iryna Nikolayeva, Oriol Guitart-Pla, Cécile V Denis, Olivier D Christophe, Johann Beghain, Elisabeth Tournier-Lasserve, Benno Schwikowski
MOTIVATION: Most computational approaches for the analysis of omics data in the context of interaction networks have very long running times, provide single or partial, often heuristic, solutions, and/or contain user-tuneable parameters. RESULTS: We introduce local enrichment analysis (LEAN) for the identification of dysregulated subnetworks from genome-wide omics data sets. By substituting the common subnetwork model with a simpler local subnetwork model, LEAN allows exact, parameter-free, efficient, and exhaustive identification of local subnetworks that are statistically dysregulated, and directly implicates single genes for follow-up experiments...
October 26, 2016: Bioinformatics
Emanuel Diego S Penha, Egiebade Iriabho, Alex Dussaq, Diana Magalhães de Oliveira, Jonas S Almeida
: The move of computational genomics workflows to Cloud Computing platforms is associated with a new level of integration and interoperability that challenges existing data representation formats. The Variant Calling Format (VCF) is in a particularly sensitive position in that regard, with both clinical and consumer-facing analysis tools relying on this self-contained description of genomic variation in Next Generation Sequencing (NGS) results. In this report we identify an isomorphic map between VCF and the reference Resource Description Framework...
October 25, 2016: Bioinformatics
Jin Zhang, Elaine R Mardis, Christopher A Maher
MOTIVATION: While high-throughput sequencing (HTS) has been used successfully to discover tumor-specific mutant peptides (neoantigens) from somatic missense mutations, the field currently lacks a method for identifying which gene fusions may generate neoantigens. RESULTS: We demonstrate the application of our gene fusion neoantigen discovery pipeline, called INTEGRATE-Neo, by identifying gene fusions in prostate cancers that may produce neoantigens. AVAILABILITY: INTEGRATE-Neo is implemented in C++ and Python...
October 24, 2016: Bioinformatics
Yifan Yang, Lei Xu, Zheyun Feng, Jeffrey A Cruz, Linda J Savage, David M Kramer, Jin Chen
MOTIVATION: Phenomics is essential for understanding the mechanisms that regulate or influence growth, fitness, and development. Techniques have been developed to conduct high-throughput large-scale phenotyping on animals, plants and humans, aiming to bridge the gap between genomics, gene functions and traits. While new developments in phenotyping techniques are exciting, we are limited by the tools to analyze fully the massive phenotype data, especially the dynamic relationships between phenotypes and environments...
October 24, 2016: Bioinformatics
Toby Dylan Hocking, Patricia Goerner-Potvin, Andreanne Morin, Xiaojian Shao, Tomi Pastinen, Guillaume Bourque
MOTIVATION: Many peak detection algorithms have been proposed for ChIP-seq data analysis, but it is not obvious which algorithm and what parameters are optimal for any given data set. In contrast, regions with and without obvious peaks can be easily labeled by visual inspection of aligned read counts in a genome browser. We propose a supervised machine learning approach for ChIP-seq data analysis, using labels that encode qualitative judgments about which genomic regions contain or do not contain peaks...
October 24, 2016: Bioinformatics
Bhusan K Kuntal, Sharmila S Mande
MOTIVATION: The majority of data generated routinely from various experiments are essentially multivariate, often categorized with multiple experimental metadata. Analyzing such results with interactive visualizations often yields interesting and intuitive results which otherwise remains undisclosed. RESULTS: In this paper, we present Web-Igloo - a GUI based interactive 'feature decomposition independent' multivariate data visualization platform. Web-Igloo is likely to be a valuable contribution in the field of visual data mining, especially for researchers working with but not limited to multi-omics data...
October 22, 2016: Bioinformatics
Tong Wang, Yuedong Yang, Yaoqi Zhou, Haipeng Gong
MOTIVATION: The quality of fragment library determines the efficiency of fragment assembly, an approach that is widely used in most de novo protein-structure prediction algorithms. Conventional fragment libraries are constructed mainly based on the identities of amino acids, sometimes facilitated by predicted information including dihedral angles and secondary structures. However, it remains challenging to identify near-native fragment structures with low sequence homology. RESULTS: We introduce a novel fragment-library-construction algorithm, LRFragLib, to improve the detection of near-native low-homology fragments of 7-10 residues, using a multi-stage, flexible selection protocol...
October 22, 2016: Bioinformatics
Jil Sander, Joachim L Schultze, Nir Yosef
: Perturbations in the environment lead to distinctive gene expression changes within a cell. Observed over time, those variations can be characterized by single impulse-like progression patterns. ImpulseDE is an R package suited to capture these patterns in high throughput time series datasets. By fitting a representative impulse model to each gene, it reports differentially expressed genes across time points from a single or between two time courses from two experiments. To optimize running time, the code uses clustering and multi-threading...
October 22, 2016: Bioinformatics
Weizhuang Zhou, Lichy Han, Russ B Altman
: Microarray measurements of gene expression constitute a large fraction of publicly shared biological data, and are available in the Gene Expression Omnibus (GEO). Many studies use GEO data to shape hypotheses and improve statistical power. Within GEO, the Affymetrix HG-U133A and HG-U133 Plus 2.0 are the two most commonly used microarray platforms for human samples; the HG-U133 Plus 2.0 platform contains 54,220 probes and the HG-U133A array contains a proper subset (21,722 probes). When different platforms are involved, the subset of common genes is most easily compared...
October 22, 2016: Bioinformatics
Daniel Mapleson, Gonzalo Garcia Accinelli, George Kettleborough, Jonathan Wright, Bernardo J Clavijo
MOTIVATION: De novo assembly of whole genome shotgun (WGS) next-generation sequencing (NGS) data benefits from high-quality input with high coverage. However, in practice, determining the quality and quantity of useful reads quickly and in a reference-free manner is not trivial. Gaining a better understanding of the WGS data, and how that data is utilised by assemblers, provides useful insights that can inform the assembly process and result in better assemblies. RESULTS: We present the K-mer Analysis Toolkit (KAT): a multi-purpose software toolkit for reference-free quality control (QC) of WGS reads and de novo genome assemblies, primarily via their k-mer frequencies and GC composition...
October 22, 2016: Bioinformatics
Read by QxMD icon Read

