journal
MENU ▼
Read by QxMD icon Read
search

Bioinformatics

journal
https://www.readbyqxmd.com/read/29462247/a-new-approach-for-interpreting-random-forest-models-and-its-application-to-the-biology-of-ageing
#1
Fabio Fabris, Aoife Doherty, Daniel Palmer, João Pedro de Magalhães, Alex A Freitas
Motivation: This work uses the Random Forest (RF) classification algorithm to predict if a gene is overexpressed, underexpressed or has no change in expression with age in the brain. RFs have high predictive power, and RF models can be interpreted using a feature (variable) importance measure. However, current feature importance measures evaluate a feature as a whole (all feature values). We show that, for a popular type of biological data (Gene Ontology-based), usually only one value of a feature is particularly important for classification and the interpretation of the RF model...
February 16, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29462238/secretsanta-flexible-pipelines-for-functional-secretome-prediction
#2
Anna Gogleva, Hajk-Georg Drost, Sebastian Schornack
Motivation: The secretome denotes the collection of secreted proteins exported outside of the cell. The functional roles of secreted proteins include the maintenance and remodelling of the extracellular matrix as well as signalling between host and non-host cells. These features make secretomes rich reservoirs of biomarkers for disease classification and host-pathogen interaction studies. Common biomarkers are extracellular proteins secreted via classical pathways that can be predicted from sequence by annotating the presence or absence of N-terminal signal peptides...
February 16, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29462250/the-lnclocator-a-subcellular-localization-predictor-for-long-non-coding-rnas-based-on-a-stacked-ensemble-classifier
#3
Cao Zhen, Xiaoyong Pan, Yang Yang, Yan Huang, Hong-Bin Shen
Motivation: The long non-coding RNA (lncRNA) studies have been hot topics in the field of RNA biology. Recent studies have shown that their subcellular localizations carry important information for understanding their complex biological functions. Considering the costly and time-consuming experiments for identifying subcellular localization of lncRNAs, computational methods are urgently desired. However, to the best of our knowledge, there are no computational tools for predicting the lncRNA subcellular locations to date...
February 15, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29462243/combining-co-evolution-and-secondary-structure-prediction-to-improve-fragment-library-generation
#4
Saulo H P de Oliveira, Charlotte M Deane
Motivation: Recent advances in co-evolution techniques have made possible the accurate prediction of protein structures in the absence of a template. Here, we provide a general approach that further utilizes co- evolution constraints to generate better fragment libraries for fragment-based protein structure prediction. Results: We have compared five different fragment library generation programmes on three different data sets encompassing over 400 unique protein folds...
February 15, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29462241/flowlearn-fast-and-precise-identification-and-quality-checking-of-cell-populations-in-flow-cytometry
#5
Markus Lux, Ryan Remy Brinkman, Cedric Chauve, Adam Laing, Anna Lorenc, Lucie Abeler-Dörner, Barbara Hammer
Motivation: Identification of cell populations in flow cytometry is a critical part of the analysis and lays the groundwork for many applications and research discovery. The current paradigm of manual analysis is time consuming and subjective. A common goal of users is to replace manual analysis with automated methods that replicate their results. Supervised tools provide the best performance in such a use case, however they require fine parameterization to obtain the best results. Hence, there is a strong need for methods that are fast to setup, accurate and interpretable...
February 15, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29462237/ls-align-an-atom-level-flexible-ligand-structural-alignment-algorithm-for-high-throughput-virtual-screening
#6
Jun Hu, Zi Liu, Dong-Jun Yu, Yang Zhang
Motivation: Sequence-order independent structural comparison, also called structural alignment, of small ligand molecules is often needed for computer-aided virtual drug screening. Although many ligand structure alignment programs are proposed, most of them build the alignments based on rigid-body shape comparison which cannot provide atom-specific alignment information nor allow structural variation; both abilities are critical to efficient high-throughput virtual screening. Results: We propose a novel ligand comparison algorithm, LS-align, to generate fast and accurate atom-level structural alignments of ligand molecules, through an iterative heuristic search of the target function that combines inter-atom distance with mass and chemical bond comparisons...
February 15, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29452392/pbrit-gene-prioritization-by-correlating-functional-and-phenotypic-annotations-through-integrative-data-fusion
#7
Ajay Anand Kumar, Lut Van Laer, Maaike Alaerts, Amin Ardeshirdavani, Yves Moreau, Kris Laukens, Bart Loeys, Geert Vandeweyer, Inanc Birol
Motivation: Computational gene prioritization can aid in disease gene identification. Here, we propose pBRIT (prioritization using Bayesian Ridge regression and Information Theoretic model), a novel adaptive and scalable prioritization tool, integrating Pubmed abstracts, Gene Ontology, Sequence similarities, Mammalian and Human Phenotype Ontology, Pathway, Interactions, Disease Ontology, Gene Association database and Human Genome Epidemiology database, into the prediction model.We explore and address effects of sparsity and inter-feature dependencies within annotation sources, and the impact of bias towards specific annotations...
February 14, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29452363/structuremapper-a-high-throughput-algorithm-for-analyzing-protein-sequence-locations-in-structural-data
#8
Anssi Nurminen, Vesa P Hytönen, Alfonso Valencia
Motivation: StructureMapper is a high-throughput algorithm for automated mapping of protein primary amino sequence locations to existing three-dimensional protein structures. The algorithm is intended for facilitating easy and efficient utilization of structural information in protein characterization and proteomics. StructureMapper provides an analysis of the identified structural locations that includes surface accessibility, flexibility, protein-protein interfacing, intrinsic disorder prediction, secondary structure assignment, biological assembly information, and sequence identity percentages, among other metrics...
February 14, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29452334/seed-2-a-user-friendly-platform-for-amplicon-high-throughput-sequencing-data-analyses
#9
Tomáš Vetrovský, Petr Baldrian, Daniel Morais, Bonnie Berger
Motivation: Modern molecular methods have increased our ability to describe microbial communities. Along with the advances brought by new sequencing technologies, we now require intensive computational resources to make sense of the large numbers of sequences continuously produced. The software developed by the scientific community to address this demand, although very useful, require experience of the command-line environment, extensive training and have steep learning curves, limiting their use...
February 14, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29447401/ramwas-fast-methylome-wide-association-study-pipeline-for-enrichment-platforms
#10
Andrey A Shabalin, Mohammad W Hattab, Shaunna L Clark, Robin F Chan, Gaurav Kumar, Karolina A Aberg, Edwin J C G van den Oord, Inanc Birol
Motivation: Enrichment-based technologies can provide measurements of DNA methylation at tens of millions of CpGs for thousands of samples. Existing tools for methylome-wide association studies cannot analyze data sets of this size and lack important features like principal component analysis, combined analysis with SNP data, and outcome predictions that are based on all informative methylation sites. Results: We present a Bioconductor R package called RaMWAS with a full set of tools for large-scale methylome-wide association studies...
February 12, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29447388/enhancing-protein-fold-determination-by-exploring-the-complementary-information-of-chemical-cross-linking-and-coevolutionary-signals
#11
Ricardo N Dos Santos, Allan J R Ferrari, Hugo C R de Jesus, Fábio C Gozzo, Faruck Morcos, Leandro Martínez, Alfonso Valencia
Motivation: Elucidation of protein native states from amino acid sequences is a primary computational challenge. Modern computational and experimental methodologies, such as molecular coevolution and chemical cross-linking mass-spectrometry allowed protein structural characterization to previously intangible systems. Despite several independent successful examples, data from these distinct methodologies have not been systematically studied in conjunction. One challenge of structural inference using coevolution is that it is limited to sequence fragments within a conserved and unique domain for which sufficient sequence datasets are available...
February 12, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29447341/chemdistiller-an-engine-for-metabolite-annotation-in-mass-spectrometry
#12
Ivan Laponogov, Noureddin Sadawi, Dieter Galea, Reza Mirnezami, Kirill A Veselkov, Jonathan Wren
Motivation: High-resolution mass spectrometry permits simultaneous detection of thousands of different metabolites in biological samples; however their automated annotation still presents a challenge due to the limited number of tailored computational solutions freely available to the scientific community. Results: Here we introduce ChemDistiller, a customizable engine that combines automated large-scale annotation of metabolites using tandem MS data with a compiled database containing tens of millions of compounds with pre-calculated 'fingerprints' and fragmentation patterns...
February 12, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29444205/ssbio-a-python-framework-for-structural-systems-biology
#13
Nathan Mih, Elizabeth Brunk, Ke Chen, Edward Catoiu, Anand Sastry, Erol Kavvas, Jonathan M Monk, Zhen Zhang, Bernhard O Palsson, Alfonso Valencia
Summary: Working with protein structures at the genome-scale has been challenging in a variety of ways. Here, we present ssbio, a Python package that provides a framework to easily work with structural information in the context of genome-scale network reconstructions, which can contain thousands of individual proteins. The ssbio package provides an automated pipeline to construct high quality genomescale models with protein structures (GEM-PROs), wrappers to popular third-party programs to compute associated protein properties, and methods to visualize and annotate structures directly in Jupyter notebooks, thus lowering the barrier of linking 3D structural data with established systems workflows...
February 12, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29444201/hierarchical-analysis-of-rna-seq-reads-improves-the-accuracy-of-allele-specific-expression
#14
Narayanan Raghupathy, Kwangbom Choi, Matthew J Vincent, Glen L Beane, Keith Sheppard, Steven C Munger, Ron Korstanje, Fernando Pardo-Manual de Villena, Gary A Churchill, Alfonso Valencia
Motivation: Allele-specific expression (ASE) refers to the differential abundance of the allelic copies of a transcript. RNA sequencing (RNA-Seq) can provide quantitative estimates of ASE for genes with transcribed polymorphisms. When short-read sequences are aligned to a diploid transcriptome, readmapping ambiguities confound our ability to directly count reads. Multi-mapping reads aligning equally well to multiple genomic locations, isoforms, or alleles can comprise the majority (>85%) of reads...
February 12, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29438560/hiite-hiv-1-incidence-and-infection-time-estimator
#15
Sung Yong Park, Tanzy M T Love, Shivankur Kapoor, Ha Youn Lee, John Hancock
Motivation: Around 2.1 million new HIV-1 infections were reported in 2015, alerting that the HIV-1 epidemic remains a significant global health challenge. Precise incidence assessment strengthens epidemic monitoring efforts and guides strategy optimization for prevention programs. Estimating the onset time of HIV-1 infection can facilitate optimal clinical management and identify key populations largely responsible for epidemic spread and thereby infer HIV-1 transmission chains. Our goal is to develop a genomic assay estimating the incidence and infection time in a single cross-sectional survey setting...
February 9, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29438558/kmgene-a-unified-r-package-for-gene-based-association-analysis-for-complex-traits
#16
Qi Yan, Zhou Fang, Wei Chen, Oliver Stegle
Summary: In this report, we introduce an R package KMgene for performing gene-based association tests for familial, multivariate or longitudinal traits using kernel machine (KM) regression under a generalized linear mixed model (GLMM) framework. Extensive simulations were performed to evaluate the validity of the approaches implemented in KMgene. Availability: http://cran.r-project.org/web/packages/KMgene. Contact: qi.yan@chp.edu or wei.chen@chp...
February 9, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29438498/chromswitch-a-flexible-method-to-detect-chromatin-state-switches
#17
Selin Jessa, Claudia L Kleinman, John Hancock
Summary: Chromatin state plays a major role in controlling gene expression, and comparative analysis of ChIP-seq data is key to understanding epigenetic regulation. We present chromswitch, an R/Bioconductor package to integrate epigenomic data in a defined window of interest to detect an overall switch in chromatin state. Chromswitch accurately classifies a benchmarking dataset, and when applied genome-wide, the tool successfully detects chromatin changes that result in brain-specific expression...
February 9, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29432533/crossplan-systematic-planning-of-genetic-crosses-to-validate-mathematical-models
#18
Aditya Pratapa, Neil Adames, Pavel Kraikivski, Nicholas Franzese, John J Tyson, Jean Peccoud, T M Murali, Bonnie Berger
Motivation: Mathematical models of cellular processes can systematically predict the phenotypes of novel combinations of multi-gene mutations. Searching for informative predictions and prioritizing them for experimental validation is challenging since the number of possible combinations grows exponentially in the number of mutations. Moreover, keeping track of the crosses needed to make new mutants and planning sequences of experiments is unmanageable when the experimenter is deluged by hundreds of potentially informative predictions to test...
February 8, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29432522/wdl-rf-predicting-bioactivities-of-ligand-molecules-acting-with-g-protein-coupled-receptors-by-combining-weighted-deep-learning-and-random-forest
#19
Jiansheng Wu, Qiuming Zhang, Weijian Wu, Tao Pang, Haifeng Hu, Wallace K B Chan, Xiaoyan Ke, Yang Zhang, Jonathan Wren
Motivation: Precise assessment of ligand bioactivities (including IC50, EC50, Ki, Kd, etc.) is essential for virtual screening and lead compound identification. However, not all ligands have experimentally-determined activities. In particular, many G protein-coupled receptors (GPCRs), which are the largest integral membrane protein family and represent targets of nearly 40% drugs on the market, lack published experimental data about ligand interactions. Computational methods with the ability to accurately predict the bioactivity of ligands can help efficiently address this problem...
February 8, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29432517/spectral-clustering-based-on-learning-similarity-matrix
#20
Seyoung Park, Hongyu Zhao, Inanc Birol
Motivation: Single-cell RNA-sequencing (scRNA-seq) technology can generate genome-wide expression data at the single-cell levels. One important objective in scRNA-seq analysis is to cluster cells where each cluster consists of cells belonging to the same cell type based on gene expression patterns. Results: We introduce a novel spectral clustering framework that imposes sparse structures on a target matrix. Specifically, we utilize multiple doubly stochastic similarity matrices to learn a similarity matrix, motivated by the observation that each similarity matrix can be a different informative representation of the data...
February 8, 2018: Bioinformatics
journal
journal
33017
1
2
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read
×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"