Read by QxMD icon Read

Bioinformatics python

Philipp N Spahn, Tyler Bath, Ryan J Weiss, Jihoon Kim, Jeffrey D Esko, Nathan E Lewis, Olivier Harismendy
Large-scale genetic screens using CRISPR/Cas9 technology have emerged as a major tool for functional genomics. With its increased popularity, experimental biologists frequently acquire large sequencing datasets for which they often do not have an easy analysis option. While a few bioinformatic tools have been developed for this purpose, their utility is still hindered either due to limited functionality or the requirement of bioinformatic expertise. To make sequencing data analysis of CRISPR/Cas9 screens more accessible to a wide range of scientists, we developed a Platform-independent Analysis of Pooled Screens using Python (PinAPL-Py), which is operated as an intuitive web-service...
November 20, 2017: Scientific Reports
Žiga Avsec, Mohammadamin Barekatain, Jun Cheng, Julien Gagneur
Motivation: Regulatory sequences are not solely defined by their nucleic acid sequence but also by their relative distances to genomic landmarks such as transcription start site, exon boundaries, or polyadenylation site. Deep learning has become the approach of choice for modeling regulatory sequences because of its strength to learn complex sequence features. However, modeling relative distances to genomic landmarks in deep neural networks has not been addressed. Results: Here we developed spline transformation, a neural network module based on splines to flexibly and robustly model distances...
November 16, 2017: Bioinformatics
Minjeong Kim, Jai-Hoon Kim, Kangseok Kim, Sunshin Kim
Motivation: With the discovery of cell-free fetal DNA in maternal blood, the demand for non-invasive prenatal testing (NIPT) has been increasing. To obtain reliable NIPT results, it is important to accurately estimate the fetal fraction. In this study, we propose an accurate and cost-effective method for measuring fetal fractions using single-nucleotide polymorphisms (SNPs). Results: A total of 84 samples were sequenced via semiconductor sequencing using a 0.3x sequencing coverage...
November 8, 2017: Bioinformatics
Akshay Kumar Avvaru, Divya Tej Sowpati, Rakesh Kumar Mishra
Motivation: Microsatellites or Simple Sequence Repeats (SSRs) are short tandem repeats of DNA motifs present in all genomes. They have long been used for a variety of purposes in the areas of population genetics, genotyping, marker-assisted selection, and forensics. Numerous studies have highlighted their functional roles in genome organization and gene regulation. Though several tools are currently available to identify SSRs from genomic sequences, they have significant limitations. Results: We present a novel algorithm called PERF for extremely fast and comprehensive identification of microsatellites from DNA sequences of any size...
November 7, 2017: Bioinformatics
Yasuhiro Tanizawa, Takatomo Fujisawa, Yasukazu Nakamura
Summary: We developed a prokaryotic genome annotation pipeline, DFAST, that also supports genome submission to public sequence databases. DFAST was originally started as an on-line annotation server, and to date, over 7,000 jobs have been processed since its first launch in 2016. Here, we present a newly implemented background annotation engine for DFAST, which is also available as a standalone command-line program. The new engine can annotate a typical-sized bacterial genome within 10 minutes, with rich information such as pseudogenes, translation exceptions, and orthologous gene assignment between given reference genomes...
November 2, 2017: Bioinformatics
Clinton L Cario, John S Witte
Motivation: As whole-genome tumor sequence and biological annotation datasets grow in size, number, and content, there is an increasing basic science and clinical need for efficient and accurate data management and analysis software. With the emergence of increasingly sophisticated data stores, execution environments, and machine learning algorithms, there is also a need for the integration of functionality across frameworks. Results: We present orchid, a python based software package for the management, annotation, and machine learning of cancer mutations...
November 2, 2017: Bioinformatics
Zhiqun Xie, Haixu Tang
Motivation: The insertion sequence (IS) elements are the smallest but most abundant autonomous transposable elements in prokaryotic genomes, which play a key role in prokaryotic genome organization and evolution. With the fast growing genomic data, it is becoming increasingly critical for biology researchers to be able to accurately and automatically annotate ISs in prokaryotic genome sequences. The available automatic IS annotation systems are either providing only incomplete IS annotation or relying on the availability of existing genome annotations...
November 1, 2017: Bioinformatics
Lianming Du, Chi Zhang, Qin Liu, Xiuyue Zhang, Bisong Yue
Summary: Microsatellites are found to be related with various diseases and widely used in population genetics as genetic markers. However, it remains a challenge to identify microsatellite from large genome and screen microsatellites for primer design from a huge result dataset. Here, we present Krait, a robust and flexible tool for fast investigation of microsatellites in DNA sequences. Krait is designed to identify all types of perfect or imperfect microsatellites on a whole genomic sequence, and is also applicable to identification of compound microsatellites...
October 18, 2017: Bioinformatics
Charles Tapley Hoyt, Andrej Konotopez, Christian Ebeling
Summary: Biological Expression Language (BEL) assembles knowledge networks from biological relations across multiple modes and scales. Here, we present PyBEL; a software package for parsing, validating, converting, storing, querying, and visualizing networks encoded in BEL. Availability: PyBEL is implemented in platform-independent, universal Python code. Its source is distributed under the Apache 2.0 License at Contact: charles...
October 18, 2017: Bioinformatics
Ignacio Faustino, S J Marrink
Summary: We introduce cgHeliParm, a python program that provides the conformational analysis of Martini-based coarse-grained double strand DNA molecules. The software calculates the helical parameters such as base, base pair and base pair step parameters. cgHeliParm can be used for the analysis of coarse grain Martini molecular dynamics trajectories without transformation into atomistic models. Availability and implementation: This package works with Python 2.7 on MacOS and Linux...
July 13, 2017: Bioinformatics
Jun Ding, Ziv Bar-Joseph
Motivation: Profiling of genome wide DNA methylation is now routinely performed when studying development, cancer and several other biological processes. Although Whole genome Bisulfite Sequencing provides high-quality methylation measurements at the resolution of nucleotides, it is relatively costly and so several studies have used alternative methods for such profiling. One of the most widely used low cost alternatives is MeDIP-Seq. However, MeDIP-Seq is biased for CpG enriched regions and thus its results need to be corrected in order to determine accurate methylation levels...
November 1, 2017: Bioinformatics
Siu H J Chan, Jingyi Cai, Lin Wang, Margaret N Simons-Senftle, Costas D Maranas
Motivation: In a genome-scale metabolic model, the biomass produced is defined to have a molecular weight (MW) of 1 g mmol-1. This is critical for correctly predicting growth yields, contrasting multiple models and more importantly modeling microbial communities. However, the standard is rarely verified in the current practice and the chemical formulae of biomass components such as proteins, nucleic acids and lipids are often represented by undefined side groups (e.g. X, R). Results: We introduced a systematic procedure for checking the biomass weight and ensuring complete mass balance of a model...
November 15, 2017: Bioinformatics
Andrew Elliott, Elizabeth Leicht, Alan Whitmore, Gesine Reinert, Felix Reed-Tsochas
Motivation: Our work is motivated by an interest in constructing a protein-protein interaction network that captures key features associated with Parkinson's disease. While there is an abundance of subnetwork construction methods available, it is often far from obvious which subnetwork is the most suitable starting point for further investigation. Results: We provide a method to assess whether a subnetwork constructed from a seed list (a list of nodes known to be important in the area of interest) differs significantly from a randomly generated subnetwork...
July 7, 2017: Bioinformatics
Matthew D Whiteside, Victor P J Gannon, Chad R Laing
Summary: Whole genome sequencing (WGS) is being adopted in public health for improved surveillance and outbreak analysis. In public health, subtyping has been used to infer phenotypes and distinguish bacterial strain groups. In silico tools that predict subtypes from sequences data are needed to transition historical data to WGS-based protocols. Phylotyper is a novel solution for in silico subtype prediction from gene sequences. Designed for incorporation into WGS pipelines, it is a general prediction tool that can be applied to different subtype schemes...
November 15, 2017: Bioinformatics
Ahmed Allam, Michael Krauthammer
Motivation: Text and genomic data are composed of sequential tokens, such as words and nucleotides that give rise to higher order syntactic constructs. In this work, we aim at providing a comprehensive Python library implementing conditional random fields (CRFs), a class of probabilistic graphical models, for robust prediction of these constructs from sequential data. Results: Python Sequence Labeling (PySeqLab) is an open source package for performing supervised learning in structured prediction tasks...
November 1, 2017: Bioinformatics
Mahito Sugiyama, M Elisabetta Ghisu, Felipe Llinares-López, Karsten Borgwardt
Summary: Measuring the similarity of graphs is a fundamental step in the analysis of graph-structured data, which is omnipresent in computational biology. Graph kernels have been proposed as a powerful and efficient approach to this problem of graph comparison. Here we provide graphkernels, the first R and Python graph kernel libraries including baseline kernels such as label histogram based kernels, classic graph kernels such as random walk based kernels, and the state-of-the-art Weisfeiler-Lehman graph kernel...
September 22, 2017: Bioinformatics
Erin M Shockley, Jasper A Vrugt, Carlos F Lopez
Summary: Biological models contain many parameters whose values are difficult to measure directly via experimentation and therefore require calibration against experimental data. Markov chain Monte Carlo (MCMC) methods are suitable to estimate multivariate posterior model parameter distributions, but these methods may exhibit slow or premature convergence in high-dimensional search spaces. Here, we present PyDREAM, a Python implementation of the (Multiple-Try) Differential Evolution Adaptive Metropolis (DREAM(ZS)) algorithm developed by Vrugt and ter Braak (2008) and Laloy and Vrugt (2012)...
October 4, 2017: Bioinformatics
Matthew D Johnston
Recent work of Johnston et al. has produced sufficient conditions on the structure of a chemical reaction network which guarantee that the corresponding discrete state space system exhibits an extinction event. The conditions consist of a series of systems of equalities and inequalities on the edges of a modified reaction network called a domination-expanded reaction network. In this paper, we present a computational implementation of these conditions written in Python and apply the program on examples drawn from the biochemical literature...
October 10, 2017: Mathematical Biosciences
Carl D Christensen, Jan-Hendrik S Hofmeyr, Johann M Rohwer
Summary: PySCeSToolbox is an extension to the Python Simulator for Cellular Systems (PySCeS) that includes tools for performing generalised supply-demand analysis, symbolic metabolic control analysis, and a framework for investigating the kinetic and thermodynamic aspects of enzyme-catalysed reactions. Each tool addresses a different aspect of metabolic behaviour, control, and regulation; the tools complement each other and can be used in conjunction to better understand higher-level system behaviour...
September 14, 2017: Bioinformatics
Oded Rimon, Dana Reichmann
Motivation: Kinetic measurements have played an important role in elucidating biochemical and biophysical phenomena for over a century. While many tools for analysing kinetic measurements exist, most require low noise levels in the data, leaving outlier measurements to be cleaned manually. This is particularly true for protein misfolding and aggregation processes, which are extremely noisy and hence difficult to model. Understanding these processes is paramount, as they are associated with diverse physiological processes and disorders, most notably neurodegenerative diseases...
September 14, 2017: Bioinformatics
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"