Read by QxMD icon Read

Bioinformatics python

Clemens Messerschmidt, Manuel Holtgrewe, Dieter Beule
Summary: We propose the simple method HLA-MA for consistency checking in pipelines operating on human HTS data. The method is based on the HLA typing result of the state-of-the-art method OptiType. Provided that there is sufficient coverage of the HLA loci, comparing HLA types allows for simple, fast, and robust matching of samples from whole genome, exome, and RNA-seq data. Our approach uses information from small but genetically highly variable regions and thus complements approaches that rely on genome or exon-wide variant profiles...
March 9, 2017: Bioinformatics
Nikolas Pontikos, Jing Yu, Ismail Moghul, Lucy Withington, Fiona Blanco-Kelly, Tom Vulliamy, Tsz Lun Wong, Cian Murphy, Valentina Cipriani, Alessia Fiorentino, Gavin Arno, Daniel Greene, Julius Ob Jacobsen, Tristan Clark, David S Gregory, Andrea Nemeth, Stephanie Halford, Chris F Inglehearn, Susan Downes, Graeme C Black, Andrew R Webster, Alison J Hardcastle, Vincent Plagnol
Summary: Phenopolis is an open-source web server providing an intuitive interface to genetic and phenotypic databases. It integrates analysis tools such as variant filtering and gene prioritisation based on phenotype. The Phenopolis platform will accelerate clinical diagnosis, gene discovery and encourage wider adoption of the Human Phenotype Ontology in the study of rare genetic diseases. Availability and Implementation: A demo of the website is available at https://phenopolis...
March 15, 2017: Bioinformatics
Erik L Clarke, Sesh A Sundararaman, Stephanie N Seifert, Frederic D Bushman, Beatrice H Hahn, Dustin Brisson
Motivation: Population genomic analyses are often hindered by difficulties in obtaining sufficient numbers of genomes for analysis by DNA sequencing. Selective whole-genome amplification (SWGA) provides an efficient approach to amplify microbial genomes from complex backgrounds for sequence acquisition. However, the process of designing sets of primers for this method has many degrees of freedom and would benefit from an automated process to evaluate the vast number of potential primer sets...
February 27, 2017: Bioinformatics
Chen Yang, Justin Chu, René L Warren, Inanç Birol
Background: The MinION sequencing instrument from Oxford Nanopore Technologies (ONT) produces long read lengths from single-molecule sequencing - valuable features for detailed genome characterization. To realize the potential of this platform, a number of groups are developing bioinformatics tools tuned for the unique characteristics of its data. We note that these development efforts would benefit from a simulator software, output of which could be used to benchmark analysis tools. Findings: Here, we introduce NanoSim, a fast and scalable read simulator that captures the technology-specific features of ONT data, and allows for adjustments upon improvement of nanopore sequencing technology...
February 24, 2017: GigaScience
Sergio Munoz, Felix D Guerrero, Anastasia Kellogg, Andrew M Heekin, Ming-Ying Leung
The cattle tick of Australia, Rhipicephalus australis, is a vector for microbial parasites that cause serious bovine diseases. The Haller's organ, located in the tick's forelegs, is crucial for host detection and mating. To facilitate the development of new technologies for better control of this agricultural pest, we aimed to sequence and annotate the transcriptome of the R. australis forelegs and associated tissues, including the Haller's organ. As G protein-coupled receptors (GPCRs) are an important family of eukaryotic proteins studied as pharmaceutical targets in humans, we prioritized the identification and classification of the GPCRs expressed in the foreleg tissues...
2017: PloS One
Yuling Ma, Chenchao Yan, Huimin Li, Wentao Wu, Yaxue Liu, Yuqian Wang, Qin Chen, Haoli Ma
Arabinogalactan proteins (AGPs) are a family of extracellular glycoproteins implicated in plant growth and development. With a rapid growth in the number of genomes sequenced in many plant species, the family members of AGPs can now be predicted to facilitate functional investigation. Building upon previous advances in identifying Arabidopsis AGPs, an integrated strategy of systematical AGP screening for "classical" and "chimeric" family members is proposed in this study. A Python script named Finding-AGP is compiled to find AGP-like sequences and filter AGP candidates under the given thresholds...
2017: Frontiers in Plant Science
Christopher M Schroeder, Franz J Hilke, Markus W Löffler, Michael Bitzer, Florian Lenz, Marc Sturm
Quality control (QC) is an important part of all NGS data analysis stages. Many available tools calculate QC metrics from different analysis steps of single sample experiments (raw reads, mapped reads and variant lists). Multi-sample experiments, as sequencing of tumor-normal pairs, require additional QC metrics to ensure validity of results. These multi-sample QC metrics still lack standardization. We therefore suggest a new workflow for QC of DNA sequencing of tumor-normal pairs. With this workflow well-known single-sample QC metrics and additionally metrics specific for tumor-normal pairs can be calculated...
January 27, 2017: Bioinformatics
Surya Gupta, Veronic De Puysseleyr, José Van der Heyden, Davy Maddelein, Irma Lemmens, Sam Lievens, Sven Degroeve, Jan Tavernier, Lennart Martens
Protein-protein interaction (PPI) studies have dramatically expanded our knowledge about cellular behaviour and development in different conditions. A multitude of high-throughput PPI techniques have been developed to achieve proteome-scale coverage for PPI studies, including the microarray based Mammalian Protein-Protein Interaction Trap (MAPPIT) system. Because such high-throughput techniques typically report thousands of interactions, managing and analysing the large amounts of acquired data is a challenge...
January 18, 2017: Bioinformatics
Bradley C Naylor, Michael T Porter, Elise Wilson, Adam Herring, Spencer Lofthouse, Austin Hannemann, Stephen R Piccolo, Alan L Rockwood, John C Price
MOTIVATION: Using mass spectrometry to measure the concentration and turnover of the individual proteins in a proteome, enables the calculation of individual synthesis and degradation rates for each protein. Software to analyze concentration is readily available, but software to analyze turnover is lacking. Data analysis workflows typically don't access the full breadth of information about instrument precision and accuracy that is present in each peptide isotopic envelope measurement...
January 16, 2017: Bioinformatics
Bin Liu, Hao Wu, Deyuan Zhang, Xiaolong Wang, Kuo-Chen Chou
To expedite the pace in conducting genome/proteome analysis, we have developed a Python package called Pse-Analysis. The powerful package can automatically complete the following five procedures: (1) sample feature extraction, (2) optimal parameter selection, (3) model training, (4) cross validation, and (5) evaluating prediction quality. All the work a user needs to do is to input a benchmark dataset along with the query biological sequences concerned. Based on the benchmark dataset, Pse-Analysis will automatically construct an ideal predictor, followed by yielding the predicted results for the submitted query samples...
January 5, 2017: Oncotarget
Frédéric Cazals, Tom Dreyfus
MOTIVATION: Software in structural bioinformatics has mainly been application driven. To favor practitioners seeking off-the-shelf applications, but also developers seeking advanced building blocks to develop novel applications, we undertook the design of the Structural Bioinformatics Library (SBL,, a generic C ++/python cross-platform software library targeting complex problems in structural bioinformatics. Its tenet is based on a modular design offering a rich and versatile framework allowing the development of novel applications requiring well specified complex operations, without compromising robustness and performances...
January 5, 2017: Bioinformatics
Serge Moulin, Nicolas Seux, Stéphane Chrétien, Christophe Guyeux, Emmanuelle Lerat
MOTIVATION: LTR retrotransposons are mobile elements that are able, like retroviruses, to copy and move inside eukaryotic genomes. In the present work, we propose a branching model for studying the propagation of LTR retrotransposons in these genomes. This model allows us to take into account both the positions and the degradation level of LTR retrotransposons copies. In our model, the duplication rate is also allowed to vary with the degradation level. RESULTS: Various functions have been implemented in order to simulate their spread and visualization tools are proposed...
October 6, 2016: Bioinformatics
Bryan Quach, Terrence S Furey
MOTIVATION: Identifying the locations of transcription factor binding sites is critical for understanding how gene transcription is regulated across different cell types and conditions. Chromatin accessibility experiments such as DNaseI sequencing (DNase-seq) and Assay for Transposase Accessible Chromatin sequencing (ATAC-seq) produce genome-wide data that include distinct "footprint" patterns at binding sites. Nearly all existing computational methods to detect footprints from these data assume that footprint signals are highly homogeneous across footprint sites...
December 19, 2016: Bioinformatics
Miika J Ahdesmäki, Simon R Gray, Justin H Johnson, Zhongwu Lai
Grafting of cell lines and primary tumours is a crucial step in the drug development process between cell line studies and clinical trials. Disambiguate is a program for computationally separating the sequencing reads of two species derived from grafted samples. Disambiguate operates on alignments to the two species and separates the components at very high sensitivity and specificity as illustrated in artificially mixed human-mouse samples. This allows for maximum recovery of data from target tumours for more accurate variant calling and gene expression quantification...
2016: F1000Research
Jin Zhang, Elaine R Mardis, Christopher A Maher
MOTIVATION: While high-throughput sequencing (HTS) has been used successfully to discover tumor-specific mutant peptides (neoantigens) from somatic missense mutations, the field currently lacks a method for identifying which gene fusions may generate neoantigens. RESULTS: We demonstrate the application of our gene fusion neoantigen discovery pipeline, called INTEGRATE-Neo, by identifying gene fusions in prostate cancers that may produce neoantigens. AVAILABILITY AND IMPLEMENTATION: INTEGRATE-Neo is implemented in C ++ and Python...
October 24, 2016: Bioinformatics
Ahmed Arslan, Vera van Noort
Recent advances in sequence technology result in large datasets of sequence variants. For the human genome, several tools are available to predict the impact of these variants on gene and protein functions. However, for model organisms such as yeast such tools are lacking, specifically to predict the effect of protein sequence altering variants on the protein level. We present a python framework that enables users to map in a fully automated fashion large set of variants to protein functional regions and post-translationally modified residues...
October 22, 2016: Bioinformatics
Jaroslaw Surkont, Yoan Diekmann, José B Pereira-Leal
The Rab family of small GTPases regulates and provides specificity to the endomembrane trafficking system; each Rab subfamily is associated with specific pathways. Thus, characterization of Rab repertoires provides functional information about organisms and evolution of the eukaryotic cell. Yet, the complex structure of the Rab family limits the application of existing methods for protein classification. Here, we present a major redesign of the Rabifier, a bioinformatic pipeline for detection and classification of Rab GTPases...
October 22, 2016: Bioinformatics
Blake L Joyce, Asher Haug-Baltzell, Sean Davey, Matthew Bomhoff, James C Schnable, Eric Lyons
Following polyploidy events, genomes undergo massive reduction in gene content through a process known as fractionation. Importantly, the fractionation process is not always random, and a bias as to which homeologous chromosome retains or loses more genes can be observed in some species. The process of characterizing whole genome fractionation requires identifying syntenic regions across genomes followed by post-processing of those syntenic datasets to identify and plot gene retention patterns. We have developed a tool, FractBias, to calculate and visualize gene retention and fractionation patterns across whole genomes...
October 29, 2016: Bioinformatics
Jorge Álvarez-Jarreta, Eduardo Ruiz-Pesini
BACKGROUND: Molecular evolution studies involve many different hard computational problems solved, in most cases, with heuristic algorithms that provide a nearly optimal solution. Hence, diverse software tools exist for the different stages involved in a molecular evolution workflow. RESULTS: We present MEvoLib, the first molecular evolution library for Python, providing a framework to work with different tools and methods involved in the common tasks of molecular evolution workflows...
October 28, 2016: BMC Bioinformatics
Shi-Yi Chen, Feilong Deng, Ying Huang, Cao Li, Linhai Liu, Xianbo Jia, Song-Jia Lai
Although various computer tools have been elaborately developed to calculate a series of statistics in molecular population genetics for both small- and large-scale DNA data, there is no efficient and easy-to-use toolkit available yet for exclusively focusing on the steps of mathematical calculation. Here, we present PopSc, a bioinformatic toolkit for calculating 45 basic statistics in molecular population genetics, which could be categorized into three classes, including (i) genetic diversity of DNA sequences, (ii) statistical tests for neutral evolution, and (iii) measures of genetic differentiation among populations...
2016: PloS One
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"