Briefings in Bioinformatics

Rok Blagus, Jelle J Goeman
When building classifiers, it is natural to require that the classifier correctly estimates the event probability (Constraint 1), that it has equal sensitivity and specificity (Constraint 2) or that it has equal positive and negative predictive values (Constraint 3). We prove that in the balanced case, where there is equal proportion of events and non-events, any classifier that satisfies one of these constraints will always satisfy all. Such unbiasedness of events and non-events is much more difficult to achieve in the case of rare events, i...
November 22, 2016: Briefings in Bioinformatics
Seyed Ali Madani Tonekaboni, Laleh Soltan Ghoraie, Venkata Satya Kumar Manem, Benjamin Haibe-Kains
Drug combinations have been proposed as a promising therapeutic strategy to overcome drug resistance and improve efficacy of monotherapy regimens in cancer. This strategy aims at targeting multiple components of this complex disease. Despite the increasing number of drug combinations in use, many of them were empirically found in the clinic, and the molecular mechanisms underlying these drug combinations are often unclear. These challenges call for rational, systematic approaches for drug combination discovery...
November 22, 2016: Briefings in Bioinformatics
Junjie Chen, Mingyue Guo, Xiaolong Wang, Bin Liu
Protein remote homology detection is one of the most fundamental and central problems for the studies of protein structures and functions, aiming to detect the distantly evolutionary relationships among proteins via computational methods. During the past decades, many computational approaches have been proposed to solve this important task. These methods have made a substantial contribution to protein remote homology detection. Therefore, it is necessary to give a comprehensive review and comparison on these computational methods...
November 22, 2016: Briefings in Bioinformatics
Adam S Brown, Chirag J Patel
Repositioning of previously approved drugs is a promising methodology because it reduces the cost and duration of the drug development pipeline and reduces the likelihood of unforeseen adverse events. Computational repositioning is especially appealing because of the ability to rapidly screen candidates in silico and to reduce the number of possible repositioning candidates. What is unclear, however, is how useful such methods are in producing clinically efficacious repositioning hypotheses. Furthermore, there is no agreement in the field over the proper way to perform validation of in silico predictions, and in fact no systematic review of repositioning validation methodologies...
November 22, 2016: Briefings in Bioinformatics
Claudia Manzoni, Demis A Kia, Jana Vandrovcova, John Hardy, Nicholas W Wood, Patrick A Lewis, Raffaele Ferrari
Advances in the technologies and informatics used to generate and process large biological data sets (omics data) are promoting a critical shift in the study of biomedical sciences. While genomics, transcriptomics and proteinomics, coupled with bioinformatics and biostatistics, are gaining momentum, they are still, for the most part, assessed individually with distinct approaches generating monothematic rather than integrated knowledge. As other areas of biomedical sciences, including metabolomics, epigenomics and pharmacogenomics, are moving towards the omics scale, we are witnessing the rise of inter-disciplinary data integration strategies to support a better understanding of biological systems and eventually the development of successful precision medicine...
November 22, 2016: Briefings in Bioinformatics
Denis C Bauer, Armella Zadoorian, Laurence O W Wilson, Natalie P Thorne
MOTIVATION: Despite being essential for numerous clinical and research applications, high-resolution human leukocyte antigen (HLA) typing remains challenging and laboratory tests are also time-consuming and labour intensive. With next-generation sequencing data becoming widely accessible, on-demand in silico HLA typing offers an economical and efficient alternative. RESULTS: In this study we evaluate the HLA typing accuracy and efficiency of five computational HLA typing methods by comparing their predictions against a curated set of > 1000 published polymerase chain reaction-derived HLA genotypes on three different data sets (whole genome sequencing, whole exome sequencing and transcriptomic sequencing data)...
November 1, 2016: Briefings in Bioinformatics
Qiqige Wuyun, Wei Zheng, Zhenling Peng, Jianyi Yang
Sequence-based prediction of residue-residue contact in proteins becomes increasingly more important for improving protein structure prediction in the big data era. In this study, we performed a large-scale comparative assessment of 15 locally installed contact predictors. To assess these methods, we collected a big data set consisting of 680 nonredundant proteins covering different structural classes and target difficulties. We investigated a wide range of factors that may influence the precision of contact prediction, including target difficulty, structural class, the alignment depth and distribution of contact pairs in a protein structure...
November 1, 2016: Briefings in Bioinformatics
Shardul Paricharak, Oscar Méndez-Lucio, Aakash Chavan Ravindranath, Andreas Bender, Adriaan P IJzerman, Gerard J P van Westen
High-throughput screening (HTS) campaigns are routinely performed in pharmaceutical companies to explore activity profiles of chemical libraries for the identification of promising candidates for further investigation. With the aim of improving hit rates in these campaigns, data-driven approaches have been used to design relevant compound screening collections, enable effective hit triage and perform activity modeling for compound prioritization. Remarkable progress has been made in the activity modeling area since the recent introduction of large-scale bioactivity-based compound similarity metrics...
October 27, 2016: Briefings in Bioinformatics
Felicia S L Ng, David Ruau, Lorenz Wernisch, Berthold Göttgens
Integrated analysis of multiple genome-wide transcription factor (TF)-binding profiles will be vital to advance our understanding of the global impact of TF binding. However, existing methods for measuring similarity in large numbers of chromatin immunoprecipitation assays with sequencing (ChIP-seq), such as correlation, mutual information or enrichment analysis, are limited in their ability to display functionally relevant TF relationships. In this study, we propose the use of graphical models to determine conditional independence between TFs and showed that network visualization provides a promising alternative to distinguish 'direct' versus 'indirect' TF interactions...
October 25, 2016: Briefings in Bioinformatics
Yi An, Jiawei Wang, Chen Li, André Leier, Tatiana Marquez-Lago, Jonathan Wilksch, Yang Zhang, Geoffrey I Webb, Jiangning Song, Trevor Lithgow
Bacterial effector proteins secreted by various protein secretion systems play crucial roles in host-pathogen interactions. In this context, computational tools capable of accurately predicting effector proteins of the various types of bacterial secretion systems are highly desirable. Existing computational approaches use different machine learning (ML) techniques and heterogeneous features derived from protein sequences and/or structural information. These predictors differ not only in terms of the used ML methods but also with respect to the used curated data sets, the features selection and their prediction performance...
October 24, 2016: Briefings in Bioinformatics
Paola G Ferrario, Inke R König
Genome-wide association studies are moving to genome-wide interaction studies, as the genetic background of many diseases appears to be more complex than previously supposed. Thus, many statistical approaches have been proposed to detect gene-gene (GxG) interactions, among them numerous information theory-based methods, inspired by the concept of entropy. These are suggested as particularly powerful and, because of their nonlinearity, as better able to capture nonlinear relationships between genetic variants and/or variables...
October 21, 2016: Briefings in Bioinformatics
Maoqi Xu, Liang Chen
The individual sample heterogeneity is one of the biggest obstacles in biomarker identification for complex diseases such as cancers. Current statistical models to identify differentially expressed genes between disease and control groups often overlook the substantial human sample heterogeneity. Meanwhile, traditional nonparametric tests lose detailed data information and sacrifice the analysis power, although they are distribution free and robust to heterogeneity. Here, we propose an empirical likelihood ratio test with a mean-variance relationship constraint (ELTSeq) for the differential expression analysis of RNA sequencing (RNA-seq)...
October 21, 2016: Briefings in Bioinformatics
(no author information available yet)
Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed...
October 21, 2016: Briefings in Bioinformatics
Linna Zhao, Di Liu, Jing Xu, Zhaoyang Wang, Yang Chen, Changgui Lei, Ying Li, Guiyou Liu, Yongshuai Jiang
At present, understanding of DNA methylation at the population level is still limited. Here, we first extended the classical framework of population genetics, such as single nucleotide polymorphism allele frequency, linkage disequilibrium (LD), LD block and haplotype, to epigenetics. Then, as an example, we compared the DNA methylation disequilibrium (MD) maps between HapMap CEU (Caucasian residents of European ancestry from Utah) population and YRI (Yoruba people from Ibadan) population (lymphoblastoid cell lines)...
October 19, 2016: Briefings in Bioinformatics
Yongchang Zheng, Qianqian Huang, Zijian Ding, Tingting Liu, Chenghai Xue, Xinting Sang, Jin Gu
The alteration of DNA methylation landscape is a key epigenetic event in cancer. As the accumulation of large-scale genome-wide DNA methylation data from clinical samples, we are able to characterize the patterns of DNA methylation alterations for identifying candidate epigenetic markers and drivers. In this survey, we take hepatocellular carcinoma (HCC) as an example to show the basic steps of analyzing the DNA methylation patterns in cancer across multiple data sets. We collected three genome-wide DNA methylation data sets with ∼800 clinical samples and the corresponding gene expression data sets...
October 19, 2016: Briefings in Bioinformatics
Ron Henkel, Robert Hoehndorf, Tim Kacprowski, Christian Knüpfer, Wolfram Liebermeister, Dagmar Waltemath
Systems biology models are rapidly increasing in complexity, size and numbers. When building large models, researchers rely on software tools for the retrieval, comparison, combination and merging of models, as well as for version control. These tools need to be able to quantify the differences and similarities between computational models. However, depending on the specific application, the notion of 'similarity' may greatly vary. A general notion of model similarity, applicable to various types of models, is still missing...
October 14, 2016: Briefings in Bioinformatics
James M Brown, Neil R Horner, Thomas N Lawson, Tanja Fiegel, Simon Greenaway, Hugh Morgan, Natalie Ring, Luis Santos, Duncan Sneddon, Lydia Teboul, Jennifer Vibert, Gagarine Yaikhom, Henrik Westerberg, Ann-Marie Mallon
High-throughput phenotyping is a cornerstone of numerous functional genomics projects. In recent years, imaging screens have become increasingly important in understanding gene-phenotype relationships in studies of cells, tissues and whole organisms. Three-dimensional (3D) imaging has risen to prominence in the field of developmental biology for its ability to capture whole embryo morphology and gene expression, as exemplified by the International Mouse Phenotyping Consortium (IMPC). Large volumes of image data are being acquired by multiple institutions around the world that encompass a range of modalities, proprietary software and metadata...
October 14, 2016: Briefings in Bioinformatics
Juan Xu, ZiShan Wang, Shengli Li, Juan Chen, Jinwen Zhang, Chunjie Jiang, Zheng Zhao, Jing Li, Yongsheng Li, Xia Li
Although systematic genomic studies have identified a broad spectrum of non-coding RNAs (ncRNAs) that are involved in breast cancer, our understanding of the epigenetic dysregulation of those ncRNAs remains limited. Here, we systematically analysed the epigenetic alterations of microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) in two breast cancer subtypes (luminal and basal). Widespread epigenetic alterations of miRNAs and lncRNAs were observed in both cancer subtypes. In contrast to protein-coding genes, the majority of epigenetically dysregulated ncRNAs were shared between subtypes, but a subset of transcriptomic and corresponding epigenetic changes occurred in a subtype-specific manner...
October 14, 2016: Briefings in Bioinformatics
Guillem Rigaill, Sandrine Balzergue, Véronique Brunaud, Eddy Blondet, Andrea Rau, Odile Rogier, José Caius, Cathy Maugis-Rabusseau, Ludivine Soubigou-Taconnat, Sébastien Aubourg, Claire Lurin, Marie-Laure Martin-Magniette, Etienne Delannoy
Numerous statistical pipelines are now available for the differential analysis of gene expression measured with RNA-sequencing technology. Most of them are based on similar statistical frameworks after normalization, differing primarily in the choice of data distribution, mean and variance estimation strategy and data filtering. We propose an evaluation of the impact of these choices when few biological replicates are available through the use of synthetic data sets. This framework is based on real data sets and allows the exploration of various scenarios differing in the proportion of non-differentially expressed genes...
October 14, 2016: Briefings in Bioinformatics
Jang-Il Sohn, Jin-Wu Nam
As the advent of next-generation sequencing (NGS) technology, various de novo assembly algorithms based on the de Bruijn graph have been developed to construct chromosome-level sequences. However, numerous technical or computational challenges in de novo assembly still remain, although many bright ideas and heuristics have been suggested to tackle the challenges in both experimental and computational settings. In this review, we categorize de novo assemblers on the basis of the type of de Bruijn graphs (Hamiltonian and Eulerian) and discuss the challenges of de novo assembly for short NGS reads regarding computational complexity and assembly ambiguity...
October 14, 2016: Briefings in Bioinformatics
