Read by QxMD icon Read

bioinformatics using machine learning

Kévin Vervier, Jacob J Michaelson
Motivation: Model-based estimates of general deleteriousness, like CADD, DANN or PolyPhen, have become indispensable tools in the interpretation of genetic variants. However, these approaches say little about the tissues in which the effects of deleterious variants will be most meaningful. Tissue-specific annotations have been recently inferred for dozens of tissues/cell types from large collections of cross-tissue epigenomic data, and have demonstrated sensitivity in predicting affected tissues in complex traits...
April 18, 2018: Bioinformatics
Bin Liu, Kai Li, De-Shuang Huang, Kuo-Chen Chou
Motivation: Identification of enhancers and their strength is important because they play a critical role in controlling gene expression. Although some bioinformatics tools were developed, they are limited in discriminating enhancers from non-enhancers only. Recently, a two-layer predictor called "iEnhancer-2L" was developed that can be used to predict the enhancer's strength as well. However, its prediction quality needs further improvement to enhance the practical application value...
June 7, 2018: Bioinformatics
Leyi Wei, Chen Zhou, Huangrong Chen, Jiangning Song, Ran Su
Motivation: Anti-cancer peptides (ACPs) have recently emerged as promising therapeutic agents for cancer treatment. Due to the avalanche of protein sequence data in the post-genomic era, there is an urgent need to develop automated computational methods to enable fast and accurate identification of novel ACPs within the vast number of candidate proteins and peptides. Results: To address this, we propose a novel predictor named ACPred-FL for accurate prediction of ACPs based on sequence information...
June 1, 2018: Bioinformatics
Ming Wen, Peisheng Cong, Zhimin Zhang, Hongmei Lu, Tonghua Li
Motivation: MicroRNAs (miRNAs) are small noncoding RNAs that function in RNA silencing and post-transcriptional regulation of gene expression by targeting messenger RNAs (mRNAs). Because the underlying mechanisms associated with miRNA binding to mRNA are not fully understood, a major challenge of miRNA studies involves the identification of miRNA-target sites on mRNA. In silico prediction of miRNA-target sites can expedite costly and time-consuming experimental work by providing the most promising miRNA-target-site candidates...
June 1, 2018: Bioinformatics
Albert Stuart Reece, Wei Wang, Gary Kenneth Hulse
The recent demonstration that addiction-relevant neuronal ensembles defined by known master transcription factors and their connectome is networked throughout mesocorticolimbic reward circuits and resonates harmonically at known frequencies implies that single-cell pan-omics techniques can improve our understanding of Substance Use Disorders (SUD's). Application of machine learning algorithms to such data could find diagnostic utility as biomarkers both to define the presence of the disorder and to quantitate its severity and find myriad applications in a developmental pipeline towards therapeutics and cure...
July 2018: Medical Hypotheses
Jingjing Zhai, Jie Song, Qian Cheng, Yunjia Tang, Chuang Ma
Motivation: The epitranscriptome, also known as chemical modifications of RNA (CMRs), is a newly discovered layer of gene regulation, the biological importance of which emerged through analysis of only a small fraction of CMRs detected by high-throughput sequencing technologies. Understanding of the epitranscriptome is hampered by the absence of computational tools for the systematic analysis of epitranscriptome sequencing data. In addition, no tools have yet been designed for accurate prediction of CMRs in plants, or to extend epitranscriptome analysis from a fraction of the transcriptome to its entirety...
May 29, 2018: Bioinformatics
Stacey L Wooden, Wayne C Koff
Although the success of vaccination to date has been unprecedented, our inadequate understanding of the details of the human immune response to immunization has resulted in several recent vaccine failures and significant delays in the development of high-need vaccines for global infectious diseases and cancer. Because of the need to better understand the immense complexity of the human immune system, the Human Vaccines Project was launched in 2015 with the mission to decode the human immune response to accelerate development of vaccines and immunotherapies for major diseases...
May 30, 2018: Human Vaccines & Immunotherapeutics
Georgina Stegmayer, Leandro E Di Persia, Mariano Rubiolo, Matias Gerard, Milton Pividori, Cristian Yones, Leandro A Bugnon, Tadeo Rodriguez, Jonathan Raad, Diego H Milone
Motivation: The importance of microRNAs (miRNAs) is widely recognized in the community nowadays because these short segments of RNA can play several roles in almost all biological processes. The computational prediction of novel miRNAs involves training a classifier for identifying sequences having the highest chance of being precursors of miRNAs (pre-miRNAs). The big issue with this task is that well-known pre-miRNAs are usually few in comparison with the hundreds of thousands of candidate sequences in a genome, which results in high class imbalance...
May 23, 2018: Briefings in Bioinformatics
Stefano Nembrini, Inke R König, Marvin N Wright
Motivation: Random forests are fast, flexible and represent a robust approach to analyze high dimensional data. A key advantage over alternative machine learning algorithms are variable importance measures, which can be used to identify relevant features or perform variable selection. Measures based on the impurity reduction of splits, such as the Gini importance, are popular because they are simple and fast to compute. However, they are biased in favor of variables with many possible split points and high minor allele frequency...
May 10, 2018: Bioinformatics
Benjamin T James, Brian B Luczak, Hani Z Girgis
Sequence clustering is a fundamental step in analyzing DNA sequences. Widely-used software tools for sequence clustering utilize greedy approaches that are not guaranteed to produce the best results. These tools are sensitive to one parameter that determines the similarity among sequences in a cluster. Often times, a biologist may not know the exact sequence similarity. Therefore, clusters produced by these tools do not likely match the real clusters comprising the data if the provided parameter is inaccurate...
May 1, 2018: Nucleic Acids Research
Fabrizio Pucci, Katrien Bernaerts, Jean Marc Kwasigroch, Marianne Rooman
Motivation: Bioinformatics tools that predict protein stability changes upon point mutations have made a lot of progress in the last decades and have become accurate and fast enough to make computational mutagenesis experiments feasible, even on a proteome scale. Despite these achievements, they still suffer from important issues that must be solved to allow further improving their performances and utilizing them to deepen our insights into protein folding and stability mechanisms. One of these problems is their bias towards the learning datasets which, being dominated by destabilizing mutations, causes predictions to be better for destabilizing than for stabilizing mutations...
April 26, 2018: Bioinformatics
Tong Liu, Zheng Wang
Background: The segment overlap score (SOV) has been used to evaluate the predicted protein secondary structures, a sequence composed of helix (H), strand (E), and coil (C), by comparing it with the native or reference secondary structures, another sequence of H, E, and C. SOV's advantage is that it can consider the size of continuous overlapping segments and assign extra allowance to longer continuous overlapping segments instead of only judging from the percentage of overlapping individual positions as Q3 score does...
2018: Source Code for Biology and Medicine
Yannis J Trakadis, Sameer Sardaar, Anthony Chen, Vanessa Fulginiti, Ankur Krishnan
Our hypothesis is that machine learning (ML) analysis of whole exome sequencing (WES) data can be used to identify individuals at high risk for schizophrenia (SCZ). This study applies ML to WES data from 2,545 individuals with SCZ and 2,545 unaffected individuals, accessed via the database of genotypes and phenotypes (dbGaP). Single nucleotide variants and small insertions and deletions were annotated by ANNOVAR using the reference genome hg19/GRCh37. Rare (predicted functional) variants with a minor allele frequency ≤1% and genotype quality ≥90 including missense, frameshift, stop gain, stop loss, intronic, and exonic splicing variants were selected...
April 28, 2018: American Journal of Medical Genetics. Part B, Neuropsychiatric Genetics
Edgar Liberis, Petar Velickovic, Pietro Sormanni, Michele Vendruscolo, Pietro Liò
Motivation: Antibodies play essential roles in the immune system of vertebrates and are powerful tools in research and diagnostics. While hypervariable regions of antibodies, which are responsible for binding, can be readily identified from their amino acid sequence, it remains challenging to accurately pinpoint which amino acids will be in contact with the antigen (the paratope). Results: In this work, we present a sequence-based probabilistic machine learning algorithm for paratope prediction, named Parapred...
April 16, 2018: Bioinformatics
Semmy Wellem Taju, Trinh-Trung-Duong Nguyen, Nguyen-Quoc-Khanh Le, Rosdyana Mangir Irawan Kusuma, Yu-Yen Ou
Motivation: Efflux protein plays a key role in pumping xenobiotics out of the cells. The prediction of efflux family proteins involved in transport process of compounds is crucial for understanding family structures, functions and energy dependencies. Many methods have been proposed to classify efflux pump transporters without considerations of any pump-specific of efflux protein families. In other words, efflux proteins protect cells from extrusion of foreign chemicals. Moreover, almost all efflux protein families have the same structure based on the analysis of significant motifs...
April 14, 2018: Bioinformatics
Margherita Francescatto, Marco Chierici, Setareh Rezvan Dezfooli, Alessandro Zandonà, Giuseppe Jurman, Cesare Furlanello
BACKGROUND: High-throughput methodologies such as microarrays and next-generation sequencing are routinely used in cancer research, generating complex data at different omics layers. The effective integration of omics data could provide a broader insight into the mechanisms of cancer biology, helping researchers and clinicians to develop personalized therapies. RESULTS: In the context of CAMDA 2017 Neuroblastoma Data Integration challenge, we explore the use of Integrative Network Fusion (INF), a bioinformatics framework combining a similarity network fusion with machine learning for the integration of multiple omics data...
April 3, 2018: Biology Direct
Daniel Veltri, Uday Kamath, Amarda Shehu
Motivation: Bacterial resistance to antibiotics is a growing concern. Antimicrobial peptides (AMPs), natural components of innate immunity, are popular targets for developing new drugs. Machine learning methods are now commonly adopted by wet-laboratory researchers to screen for promising candidates. Results: In this work we utilize deep learning to recognize antimicrobial activity. We propose a neural network model with convolutional and recurrent layers that leverage primary sequence composition...
March 24, 2018: Bioinformatics
Kevin K Yang, Zachary Wu, Claire N Bedbrook, Frances H Arnold
Motivation: Machine-learning models trained on protein sequences and their measured functions can infer biological properties of unseen sequences without requiring an understanding of the underlying physical or biological mechanisms. Such models enable the prediction and discovery of sequences with optimal properties. Machine-learning models generally require that their inputs be vectors, and the conversion from a protein sequence to a vector representation affects the model's ability to learn...
March 23, 2018: Bioinformatics
Matteo Lo Monte, Candida Manelfi, Marica Gemei, Daniela Corda, Andrea Rosario Beccari
Motivation: ADP-ribosylation is a post-translational modification implicated in several crucial cellular processes, ranging from regulation of DNA repair and chromatin structure to cell metabolism and stress responses. To date, a complete understanding of ADP-ribosylation targets and their modification sites in different tissues and disease states is still lacking. Identification of ADP-ribosylation sites is required to discern the molecular mechanisms regulated by this modification. This motivated us to develop a computational tool for the prediction of ADP-ribosylated sites...
March 15, 2018: Bioinformatics
Dieter Galea, Ivan Laponogov, Kirill Veselkov
Motivation: Recognition of biomedical entities from scientific text is a critical component of natural language processing and automated information extraction platforms. Modern named entity recognition approaches rely heavily on supervised machine learning techniques, which are critically dependent on annotated training corpora. These approaches have been shown to perform well when trained and tested on the same source. However, in such scenario, the performance and evaluation of these models may be optimistic, as such models may not necessarily generalize to independent corpora, resulting in potential non-optimal entity recognition for large-scale tagging of widely diverse articles in databases such as PubMed...
March 10, 2018: Bioinformatics
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"