Alvaro Chiner-Oms, Fernando González-Candelas
We present EvalMSA, a software tool for evaluating and detecting outliers in multiple sequence alignments (MSAs). This tool allows the identification of divergent sequences in MSAs by scoring the contribution of each row in the alignment to its quality using a sum-of-pair-based method and additional analyses. Our main goal is to provide users with objective data in order to take informed decisions about the relevance and/or pertinence of including/retaining a particular sequence in an MSA. EvalMSA is written in standard Perl and also uses some routines from the statistical language R...
2016: Evolutionary Bioinformatics Online
Renzhi Cao, Debswapna Bhattacharya, Jie Hou, Jianlin Cheng
BACKGROUND: Protein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. Especially, estimating the quality of a single protein model, which is important for selecting a few good models out of a large model pool consisting of mostly low-quality models, is still a largely unsolved problem. RESULTS: We introduce a novel single-model quality assessment method DeepQA based on deep belief network that utilizes a number of selected features describing the quality of a model from different perspectives, such as energy, physio-chemical characteristics, and structural information...
December 5, 2016: BMC Bioinformatics
Przemyslaw Stempor, Julie Ahringer
Experiments involving high-throughput sequencing are widely used for analyses of chromatin function and gene expression. Common examples are the use of chromatin immunoprecipitation for the analysis of chromatin modifications or factor binding, enzymatic digestions for chromatin structure assays, and RNA sequencing to assess gene expression changes after biological perturbations. To investigate the pattern and abundance of coverage signals across regions of interest, data are often visualized as profile plots of average signal or stacked rows of signal in the form of heatmaps...
2016: Wellcome Open Res
Ioannis K Moutsatsos, Imtiaz Hossain, Claudia Agarinis, Fred Harbinski, Yann Abraham, Luc Dobler, Xian Zhang, Christopher J Wilson, Jeremy L Jenkins, Nicholas Holway, John Tallarico, Christian N Parker
High-throughput screening generates large volumes of heterogeneous data that require a diverse set of computational tools for management, processing, and analysis. Building integrated, scalable, and robust computational workflows for such applications is challenging but highly valuable. Scientific data integration and pipelining facilitate standardized data processing, collaboration, and reuse of best practices. We describe how Jenkins-CI, an "off-the-shelf," open-source, continuous integration system, is used to build pipelines for processing images and associated data from high-content screening (HCS)...
November 29, 2016: Journal of Biomolecular Screening
Giovanni Scala, Ornella Affinito, Domenico Palumbo, Ermanno Florio, Antonella Monticelli, Gennaro Miele, Lorenzo Chiariotti, Sergio Cocozza
BACKGROUND: CpG sites in an individual molecule may exist in a binary state (methylated or unmethylated) and each individual DNA molecule, containing a certain number of CpGs, is a combination of these states defining an epihaplotype. Classic quantification based approaches to study DNA methylation are intrinsically unable to fully represent the complexity of the underlying methylation substrate. Epihaplotype based approaches, on the other hand, allow methylation profiles of cell populations to be studied at the single molecule level...
November 25, 2016: BMC Bioinformatics
S Stucki, P Orozco-terWengel, B R Forester, S Duruz, L Colli, C Masembe, R Negrini, E Landguth, M R Jones, M W Bruford, P Taberlet, S Joost
With the increasing availability of both molecular and topo-climatic data, the main challenges facing landscape genomics-i.e. the combination of landscape ecology with population genomics - include processing large numbers of models and distinguishing between selection and demographic processes (e.g. population structure). Several methods address the latter, either by estimating a null model of population history or by simultaneously inferring environmental and demographic effects. Here we present Samβada, an approach designed to study signatures of local adaptation, with special emphasis on high performance computing of large-scale genetic and environmental datasets...
November 1, 2016: Molecular Ecology Resources
Diego J Zea, Diego Anfossi, Morten Nielsen, Cristina Marino-Buslje
MOTIVATION: MIToS is an environment for mutual information analysis and a framework for protein multiple sequence alignments (MSAs) and protein structures (PDB) management in Julia language. It integrates sequence and structural information through SIFTS, making Pfam MSAs analysis straightforward. MIToS streamlines the implementation of any measure calculated from residue contingency tables and its optimization and testing in terms of protein contact prediction. As an example, we implemented and tested a BLOSUM62-based pseudo-count strategy in mutual information analysis...
October 22, 2016: Bioinformatics
Eshel Faraggi, Maksim Kouza, Yaoqi Zhou, Andrzej Kloczkowski
A fast accessible surface area (ASA) predictor is presented. In this new approach no residue mutation profiles generated by multiple sequence alignments are used as inputs. Instead, we use only single sequence information and global features such as single-residue and two-residue compositions of the chain. The resulting predictor is both highly more efficient than sequence alignment based predictors and of comparable accuracy to them. Introduction of the global inputs significantly helps achieve this comparable accuracy...
2017: Methods in Molecular Biology
Eshel Faraggi, Andrzej Kloczkowski
Accurate prediction of protein secondary structure and other one-dimensional structure features is essential for accurate sequence alignment, three-dimensional structure modeling, and function prediction. SPINE-X is a software package to predict secondary structure as well as accessible surface area and dihedral angles ϕ and ψ. For secondary structure SPINE-X achieves an accuracy of between 81 and 84 % depending on the dataset and choice of tests. The Pearson correlation coefficient for accessible surface area prediction is 0...
2017: Methods in Molecular Biology
Benjamin A Thomas, Vesna Cuplov, Alexandre Bousse, Adriana Mendes, Kris Thielemans, Brian F Hutton, Kjell Erlandsson
Positron emission tomography (PET) images are degraded by a phenomenon known as the partial volume effect (PVE). Approaches have been developed to reduce PVEs, typically through the utilisation of structural information provided by other imaging modalities such as MRI or CT. These methods, known as partial volume correction (PVC) techniques, reduce PVEs by compensating for the effects of the scanner resolution, thereby improving the quantitative accuracy. The PETPVC toolbox described in this paper comprises a suite of methods, both classic and more recent approaches, for the purposes of applying PVC to PET data...
November 21, 2016: Physics in Medicine and Biology
Yang Liu, Saad M Khan, Juexin Wang, Mats Rynge, Yuanxun Zhang, Shuai Zeng, Shiyuan Chen, Joao V Maldonado Dos Santos, Babu Valliyodan, Prasad P Calyam, Nirav Merchant, Henry T Nguyen, Dong Xu, Trupti Joshi
BACKGROUND: With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS)...
October 6, 2016: BMC Bioinformatics
Giacomo Janson, Chengxin Zhang, Maria Giulia Prado, Alessandro Paiardini
MOTIVATION: The recently released PyMod GUI integrates many of the individual steps required for protein sequence-structure analysis and homology modeling within the interactive visualization capabilities of PyMOL. Here we describe the improvements introduced into the version 2.0 of PyMod. RESULTS: The original code of PyMod has been completely rewritten and improved in version 2.0 to extend PyMOL with packages such as Clustal Omega, PSIPRED and CAMPO. Integration with the popular web services ESPript and WebLogo is also provided...
October 13, 2016: Bioinformatics
Martin Adam, Heidi Fleischer, Kerstin Thurow
In the past year, automation has become more and more important in the field of elemental and structural chemical analysis to reduce the high degree of manual operation and processing time as well as human errors. Thus, a high number of data points are generated, which requires fast and automated data evaluation. To handle the preprocessed export data from different analytical devices with software from various vendors offering a standardized solution without any programming knowledge should be preferred. In modern laboratories, multiple users will use this software on multiple personal computers with different operating systems (e...
October 13, 2016: Journal of Laboratory Automation
Wei Shen, Shuai Le, Yan Li, Fuquan Hu
FASTA and FASTQ are basic and ubiquitous formats for storing nucleotide and protein sequences. Common manipulations of FASTA/Q file include converting, searching, filtering, deduplication, splitting, shuffling, and sampling. Existing tools only implement some of these manipulations, and not particularly efficiently, and some are only available for certain operating systems. Furthermore, the complicated installation process of required packages and running environments can render these programs less user friendly...
2016: PloS One
Kersten Döring, Björn A Grüning, Kiran K Telukunta, Philippe Thomas, Stefan Günther
Information extraction from biomedical literature is continuously growing in scope and importance. Many tools exist that perform named entity recognition, e.g. of proteins, chemical compounds, and diseases. Furthermore, several approaches deal with the extraction of relations between identified entities. The BioCreative community supports these developments with yearly open challenges, which led to a standardised XML text annotation format called BioC. PubMed provides access to the largest open biomedical literature repository, but there is no unified way of connecting its data to natural language processing tools...
2016: PloS One
Zhenjiang Zech Xu, David H Mathews
RNA secondary structure is often predicted using folding thermodynamics. RNAstructure is a software package that includes structure prediction by free energy minimization, prediction of base pairing probabilities, prediction of structures composed of highly probably base pairs, and prediction of structures with pseudoknots. A user-friendly graphical user interface is provided, and this interface works on Windows, Apple OS X, and Linux. This chapter provides protocols for using RNAstructure for structure prediction...
2016: Methods in Molecular Biology
Jorge González-Domínguez, Yongchao Liu, Juan Touriño, Bertil Schmidt
: MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input datasets. In this work we present MSAProbs-MPI, a distributed-memory parallel version of the multithreaded MSAProbs tool that is able to reduce runtimes by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on a cluster with 32 nodes (each containing two Intel Haswell processors) shows reductions in execution time of over one order of magnitude for typical input datasets...
September 16, 2016: Bioinformatics
Santhilal Subhash, Chandrasekhar Kanduri
BACKGROUND: High-throughput technologies such as ChIP-sequencing, RNA-sequencing, DNA sequencing and quantitative metabolomics generate a huge volume of data. Researchers often rely on functional enrichment tools to interpret the biological significance of the affected genes from these high-throughput studies. However, currently available functional enrichment tools need to be updated frequently to adapt to new entries from the functional database repositories. Hence there is a need for a simplified tool that can perform functional enrichment analysis by using updated information directly from the source databases such as KEGG, Reactome or Gene Ontology etc...
2016: BMC Bioinformatics
Samuel S Shepard, Sarah Meno, Justin Bahl, Malania M Wilson, John Barnes, Elizabeth Neuhaus
BACKGROUND: Deep sequencing makes it possible to observe low-frequency viral variants and sub-populations with greater accuracy and sensitivity than ever before. Existing platforms can be used to multiplex a large number of samples; however, analysis of the resulting data is complex and involves separating barcoded samples and various read manipulation processes ending in final assembly. Many assembly tools were designed with larger genomes and higher fidelity polymerases in mind and do not perform well with reads derived from highly variable viral genomes...
September 5, 2016: BMC Genomics
Vinzent Boerner, Bruce Tier
BACKGROUND: The advent of genomic marker data has triggered the development of various Bayesian algorithms for estimation of marker effects, but software packages implementing these algorithms are not readily available, or are limited to a single algorithm, uni-variate analysis or a limited number of factors. Moreover, script based environments like R may not be able to handle large-scale genomic data or exploit model properties which save computing time or memory (RAM). RESULTS: BESSiE is a software designed for best linear unbiased prediction (BLUP) and Bayesian Markov chain Monte Carlo analysis of linear mixed models allowing for continuous and/or categorical multivariate, repeated and missing observations, various random and fixed factors and large-scale genomic marker data...
2016: Genetics, Selection, Evolution: GSE
