Read by QxMD icon Read


Li Xue, Bin Tang, Wei Chen, Jiesi Luo
Motivation: Various bacterial pathogens can deliver their secreted substrates also called effectors through type III secretion systems (T3SSs) into host cells and cause diseases. Since T3SS secreted effectors (T3SEs) play important roles in pathogen-host interactions, identifying them is crucial to our understanding of the pathogenic mechanisms of T3SSs. However, the effectors display high level of sequence diversity, therefore making the identification a difficult process. There is a need to develop a novel and effective method to screen and select putative novel effectors from bacterial genomes that can be validated by a smaller number of key experiments...
November 8, 2018: Bioinformatics
João C Marques, Michael B Orger
Motivation: How to partition a data set into a set of distinct clusters is a ubiquitous and challenging problem. The fact that data varies widely in features such as cluster shape, cluster number, density distribution, background noise, outliers and degree of overlap, makes it difficult to find a single algorithm that can be broadly applied. One recent method, clusterdp, based on search of density peaks, can be applied successfully to cluster many kinds of data, but it is not fully automatic, and fails on some simple data distributions...
November 8, 2018: Bioinformatics
Wei Shi, Jianhua Chen, Mao Luo, Min Chen
Motivation: With the development and the gradually popularized application of next-generation sequencing technologies (NGS), genome sequencing has been becoming faster and cheaper, creating a massive amount of genome sequence data which still grows at an explosive rate. The time and cost of transmission, storage, processing and analysis of these genetic data have become bottlenecks that hinder the development of genetics and biomedicine. Although there are many common data compression algorithms, they are not effective for genome sequences due to their inability to consider and exploit the inherent characteristics of genome sequence data...
November 8, 2018: Bioinformatics
Gregory J Hunt, Saskia Freytag, Melanie Bahlo, Johann A Gagnon-Bartsch
Motivation: Cell type composition of tissues is important in many biological processes. To help understand cell type composition using gene expression data, methods of estimating (deconvolving) cell type proportions have been developed. Such estimates are often used to adjust for confounding effects of cell type in differential expression analysis (DEA). Results: We propose dtangle, a new cell type deconvolution method. dtangle works on a range of DNA microarray and bulk RNA-seq platforms...
November 8, 2018: Bioinformatics
Fatima Zohra Smaili, Xin Gao, Robert Hoehndorf
Motivation: Ontologies are widely used in biology for data annotation, integration, and analysis. In addition to formally structured axioms, ontologies contain meta-data in the form of annotation axioms which provide valuable pieces of information that characterize ontology classes. Annotation axioms commonly used in ontologies include class labels, descriptions, or synonyms. Despite being a rich source of semantic information, the ontology meta-data are generally unexploited by ontology-based analysis methods such...
November 8, 2018: Bioinformatics
Mark Howison, Mia Coetzer, Rami Kantor
Motivation: Next-generation deep sequencing of viral genomes, particularly on the Illumina platform, is increasingly applied in HIV research. Yet, there is no standard protocol or method used by the research community to account for measurement errors that arise during sample preparation and sequencing. Correctly calling high and low frequency variants while controlling for erroneous variants is an important precursor to downstream interpretation, such as studying the emergence of HIV drug-resistancemutations, which in turn has clinical applications and can improve patient care...
November 8, 2018: Bioinformatics
Sebastian Deorowicz, Agnieszka Debudaj-Grabysz, Adam Gudys, Szymon Grabowski
Motivation: Mapping reads to a reference genome is often the first step in a sequencing data analysis pipeline. The reduction of sequencing costs implies a need for algorithms able to process increasing amounts of generated data in reasonable time. Results: We present Whisper, an accurate and high-performant mapping tool, based on the idea of sorting reads and then mapping them against suffix arrays for the reference genome and its reverse complement. Employing task and data parallelism as well as storing temporary data on disk result in superior time efficiency at reasonable memory requirements...
November 8, 2018: Bioinformatics
Eran Barash, Neta Sal-Man, Sivan Sabato, Michal Ziv-Ukelson
Motivation: Bacterial infections are a major cause of illness worldwide. However, most bacterial strains pose no threat to human health and may even be beneficial. Thus, developing powerful diagnostic bioinformatic tools that differentiate pathogenic from commensal bacteria are critical for effective treatment of bacterial infections. Results: We propose a machine-learning approach for classifying human-hosted bacteria as pathogenic or non-pathogenic based on their genome-derived proteomes...
November 8, 2018: Bioinformatics
Yuansheng Liu, Yu Yu, Marcel E Dinger, Jinyan Li
Motivation: Advanced high-throughput sequencing technologies have produced massive amount of reads data, and algorithms have been specially designed to contract the size of these data sets for efficient storage and transmission. Reordering reads with regard to their positions in de novo assembled contigs or in explicit reference sequences has been proven to be one of the most effective reads compression approach. As there is usually no good prior knowledge about the reference sequence, current focus is on the novel construction of de novo assembled contigs...
November 8, 2018: Bioinformatics
Arthur Zwaenepoel, Yves Van de Peer
Motivation: Ancient whole genome duplications (WGDs) have been uncovered in almost all major lineages of life on Earth and the search for traces or remnants of such events has become standard practice in most genome analyses. This is especially true for plants, where ancient WGDs are abundant. Common approaches to find evidence for ancient WGDs include the construction of KS distributions and the analysis of intragenomic co-linearity. Despite the increased interest in WGDs and the acknowledgement of their evolutionary importance, user-friendly and comprehensive tools for their analysis are lacking...
November 6, 2018: Bioinformatics
Hao Yuan, Lei Cai, Zhengyang Wang, Xia Hu, Shaoting Zhang, Shuiwang Ji
Motivation: Cellular function is closely related to the localizations of its substructures. It is, however, challenging to experimentally label all subcellular structures simultaneously in the same cell. This raises the need of building a computational model to learn the relationships among these subcellular structures and use reference structures to infer the localizations of other structures. Results: We formulate such a task as a conditional image generation problem and propose to use conditional generative adversarial networks for tackling it...
November 6, 2018: Bioinformatics
Miguel Correa Marrero, Richard G H Immink, Dick de Ridder, Aalt D J van Dijk
Motivation: Predicting residue-residue contacts between interacting proteins is an important problem in bioinformatics. The growing wealth of sequence data can be used to infer these contacts through correlated mutation analysis on multiple sequence alignments of interacting homologs of the proteins of interest. This requires correct identification of pairs of interacting proteins for many species, in order to avoid introducing noise (i.e. non-interacting sequences) in the analysis that will decrease predictive performance...
November 6, 2018: Bioinformatics
Anton Pirogov, Peter Pfaffelhuber, Angelika Börsch-Haubold, Bernhard Haubold
Motivation: Unique sequence regions are associated with genetic function in vertebrate genomes. However, measuring uniqueness, or absence of long repeats, along a genome is conceptually and computationally difficult. Here we use a previously published variant of the Lempel-Ziv complexity, the match complexity, Cm, and augment it by deriving its null distribution for random sequences. We then apply Cm to the human and mouse genomes to investigate the relationship between sequence complexity and function...
November 5, 2018: Bioinformatics
Meiling Wang, Xiaoke Hao, Jiashuang Huang, Wei Shao, Daoqiang Zhang
Motivation: Neuroimaging genetics is an emerging field to identify the associations between genetic variants (e.g., single nucleotide polymorphisms (SNPs)) and quantitative traits (QTs) such as brain imaging phenotypes. However, most of the current studies only focus on the associations between brain structure imaging and genetic variants, while neglecting the connectivity information between brain regions. In addition, the brain itself is a complex network, and the higher-order interaction may contain useful information for the mechanistic understanding of diseases (i...
November 5, 2018: Bioinformatics
Sebastian Daberdaku, Carlo Ferrari
Motivation: Antibodies are a class of proteins capable of specifically recognizing and binding to a virtually infinite number of antigens. This binding malleability makes them the most valuable category of biopharmaceuticals for both diagnostic and therapeutic applications. The correct identification of the antigen-binding residues in the antibody is crucial for all antibody design and engineering techniques and could also help to understand the complex antigen binding mechanisms. However, the antibody-binding interface prediction field appears to be still rather underdeveloped...
November 5, 2018: Bioinformatics
Ruiqing Zheng, Min Li, Xiang Chen, Fang-Xiang Wu, Yi Pan, Jianxin Wang
Motivation: Reconstructing gene regulatory networks (GRNs) based on gene expression profiles is still an enormous challenge in systems biology. Random forest based methods have been proved a kind of efficient methods to evaluate the importance of gene regulations. Nevertheless, the accuracy of traditional methods can be further improved. With time-series gene expression data, exploiting inherent time information and high order time lag are promising strategies to improve the power and accuracy of GRNs inference...
November 5, 2018: Bioinformatics
Niema Moshiri, Manon Ragonnet-Cronin, Joel O Wertheim, Siavash Mirarab
Motivation: The ability to simulate epidemics as a function of model parameters allows insights that are unobtainable from real datasets. Further, reconstructing transmission networks for fast-evolving viruses like HIV may have the potential to greatly enhance epidemic intervention, but transmission network reconstruction methods have been inadequately studied, largely because it is difficult to obtain "truth" sets on which to test them and properly measure their performance...
November 5, 2018: Bioinformatics
Abbas A Rizvi, Ezgi Karaesmen, Martin Morgan, Leah Preus, Junke Wang, Michael Sovic, Theresa Hahn, Lara E Sucheston-Campbell
Summary: To address the limited software options for performing survival analyses with millions of SNPs, we developed gwasurvivr, an R/Bioconductor package with a simple interface for conducting genome wide survival analyses using VCF (outputted from Michigan or Sanger imputation servers), IMPUTE2 or PLINK files. To decrease the number of iterations needed for convergence when optimizing the parameter estimates in the Cox model we modified the R package survival; covariates in the model are first fit without the SNP, and those parameter estimates are used as initial points...
November 5, 2018: Bioinformatics
Milton Pividori, Hae Kyung Im
Summary: Large biobanks, such as UK Biobank with half a million participants, are changing the scale and availability of genotypic and phenotypic data for researchers to ask fundamental questions about the biology of health and disease. The breadth of the UK Biobank data is enabling discoveries at an unprecedented pace. However, this size and complexity pose new challenges to investigators who need to keep the accruing data up to date, comply with potential consent changes, and efficiently and reproducibly extract subsets of the data to answer specific scientific questions...
November 5, 2018: Bioinformatics
Mahdi Shafiee Kamalabad, Alexander Martin Heberle, Kathrin Thedieck, Marco Grzegorczyk
Motivation: Non-homogeneous dynamic Bayesian networks (NH-DBNs) are a popular modelling tool for learning cellular networks from time series data. In systems biology, time series are often measured under different experimental conditions, and not rarely only some network interaction parameters depend on the condition while the other parameters stay constant across conditions. For this situation, we propose a new partially NH-DBN, based on Bayesian hierarchical regression models with partitioned design matrices...
November 5, 2018: Bioinformatics
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"