Read by QxMD icon Read

bioinformatics using machine learning

Yinyin Cai, Zhijun Liao, Ying Ju, Juan Liu, Yong Mao, Xiangrong Liu
The research on resistance genes (R-gene) plays a vital role in bioinformatics as it has the capability of coping with adverse changes in the external environment, which can form the corresponding resistance protein by transcription and translation. It is meaningful to identify and predict R-gene of Larimichthys crocea (L.Crocea). It is friendly for breeding and the marine environment as well. Large amounts of L.Crocea's immune mechanisms have been explored by biological methods. However, much about them is still unclear...
December 6, 2016: Scientific Reports
Carlos Fernandez-Lozano, Marcos Gestal, Cristian R Munteanu, Julian Dorado, Alejandro Pazos
The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs...
2016: PeerJ
Nikolaos-Kosmas Chlis, Ekaterini S Bei, Michael Zervakis
The application of machine learning methods for the identification of candidate genes responsible for phenotypes of interest, such as cancer, is a major challenge in the field of bioinformatics. These lists of genes are often called genomic signatures and their linkage to phenotype associations may form a significant step in discovering the causation between genotypes and phenotypes. Traditional methods that produce genomic signatures from DNA Microarray data tend to extract significantly different lists under relatively small variations of the training data...
November 29, 2016: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Sacha Beaumeunier, Jérôme Audoux, Anthony Boureux, Florence Ruffle, Thérèse Commes, Nicolas Philippe, Ronnie Alves
BACKGROUND: High-throughput sequencing technology and bioinformatics have identified chimeric RNAs (chRNAs), raising the possibility of chRNAs expressing particularly in diseases can be used as potential biomarkers in both diagnosis and prognosis. RESULTS: The task of discriminating true chRNAs from the false ones poses an interesting Machine Learning (ML) challenge. First of all, the sequencing data may contain false reads due to technical artifacts and during the analysis process, bioinformatics tools may generate false positives due to methodological biases...
2016: BioData Mining
Tolutola Oyetunde, Muhan Zhang, Yixin Chen, Yinjie Tang, Cynthia Lo
MOTIVATION: Metabolic network reconstructions are often incomplete. Constraint-based and pattern-based methodologies have been used for automated gap filling of these networks, each with its own strengths and weaknesses. Moreover, since validation of hypotheses made by gap filling tools require experimentation, it is challenging to benchmark performance and make improvements other than that related to speed and scalability. RESULTS: We present BoostGAPFILL, an open source tool that leverages both constraint-based and machine learning methodologies for hypotheses generation in gap filling and metabolic model refinement...
October 26, 2016: Bioinformatics
Yi An, Jiawei Wang, Chen Li, André Leier, Tatiana Marquez-Lago, Jonathan Wilksch, Yang Zhang, Geoffrey I Webb, Jiangning Song, Trevor Lithgow
Bacterial effector proteins secreted by various protein secretion systems play crucial roles in host-pathogen interactions. In this context, computational tools capable of accurately predicting effector proteins of the various types of bacterial secretion systems are highly desirable. Existing computational approaches use different machine learning (ML) techniques and heterogeneous features derived from protein sequences and/or structural information. These predictors differ not only in terms of the used ML methods but also with respect to the used curated data sets, the features selection and their prediction performance...
October 24, 2016: Briefings in Bioinformatics
Luisa Azevedo, Matthew Mort, Antonio C Costa, Raquel M Silva, Dulce Quelhas, Antonio Amorim, David N Cooper
Understanding the functional sequelae of amino-acid replacements is of fundamental importance in medical genetics. Perhaps, the most intuitive way to assess the potential pathogenicity of a given human missense variant is by measuring the degree of evolutionary conservation of the substituted amino-acid residue, a feature that generally serves as a good proxy metric for the functional/structural importance of that residue. However, the presence of putatively compensated variants as the wild-type alleles in orthologous proteins of other mammalian species not only challenges this classical view of amino-acid essentiality but also precludes the accurate evaluation of the functional impact of this type of missense variant using currently available bioinformatic prediction tools...
October 5, 2016: European Journal of Human Genetics: EJHG
Daniel Sánchez-Taltavull, Parameswaran Ramachandran, Nelson Lau, Theodore J Perkins
Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities...
2016: PloS One
Jesper Tegnér, Hector Zenil, Narsis A Kiani, Gordon Ball, David Gomez-Cabrero
Systems in nature capable of collective behaviour are nonlinear, operating across several scales. Yet our ability to account for their collective dynamics differs in physics, chemistry and biology. Here, we briefly review the similarities and differences between mathematical modelling of adaptive living systems versus physico-chemical systems. We find that physics-based chemistry modelling and computational neuroscience have a shared interest in developing techniques for model reductions aiming at the identification of a reduced subsystem or slow manifold, capturing the effective dynamics...
November 13, 2016: Philosophical Transactions. Series A, Mathematical, Physical, and Engineering Sciences
Li Gu, Lichun Xue, Qi Song, Fengji Wang, Huaqin He, Zhongyi Zhang
During commercial transactions, the quality of flue-cured tobacco leaves must be characterized efficiently, and the evaluation system should be easily transferable across different traders. However, there are over 3000 chemical compounds in flue-cured tobacco leaves; thus, it is impossible to evaluate the quality of flue-cured tobacco leaves using all the chemical compounds. In this paper, we used Support Vector Machine (SVM) algorithm together with 22 chemical compounds selected by ReliefF-Particle Swarm Optimization (R-PSO) to classify the fragrant style of flue-cured tobacco leaves, where the Accuracy (ACC) and Matthews Correlation Coefficient (MCC) were 90...
September 9, 2016: Journal of Bioinformatics and Computational Biology
Lin Liu, Lin Tang, Wen Dong, Shaowen Yao, Wei Zhou
BACKGROUND: With the rapid accumulation of biological datasets, machine learning methods designed to automate data analysis are urgently needed. In recent years, so-called topic models that originated from the field of natural language processing have been receiving much attention in bioinformatics because of their interpretability. Our aim was to review the application and development of topic models for bioinformatics. DESCRIPTION: This paper starts with the description of a topic model, with a focus on the understanding of topic modeling...
2016: SpringerPlus
Getiria Onsongo, Linda B Baughn, Matthew Bower, Christine Henzler, Matthew Schomaker, Kevin A T Silverstein, Bharat Thyagarajan
Simultaneous detection of small copy number variations (CNVs) (<0.5 kb) and single-nucleotide variants in clinically significant genes is of great interest for clinical laboratories. The analytical variability in next-generation sequencing (NGS) and artifacts in coverage data because of issues with mappability along with lack of robust bioinformatics tools for CNV detection have limited the utility of targeted NGS data to identify CNVs. We describe the development and implementation of a bioinformatics algorithm, copy number variation-random forest (CNV-RF), that incorporates a machine learning component to identify CNVs from targeted NGS data...
September 3, 2016: Journal of Molecular Diagnostics: JMD
Matthew Ruffalo, Ziv Bar-Joseph
MOTIVATION: Reconstructing regulatory networks from expression and interaction data is a major goal of systems biology. While much work has focused on trying to experimentally and computationally determine the set of transcription-factors (TFs) and microRNAs (miRNAs) that regulate genes in these networks, relatively little work has focused on inferring the regulation of miRNAs by TFs. Such regulation can play an important role in several biological processes including development and disease...
September 1, 2016: Bioinformatics
Sheng Wang, Jianzhu Ma, Jinbo Xu
MOTIVATION: Protein intrinsically disordered regions (IDRs) play an important role in many biological processes. Two key properties of IDRs are (i) the occurrence is proteome-wide and (ii) the ratio of disordered residues is about 6%, which makes it challenging to accurately predict IDRs. Most IDR prediction methods use sequence profile to improve accuracy, which prevents its application to proteome-wide prediction since it is time-consuming to generate sequence profiles. On the other hand, the methods without using sequence profile fare much worse than using sequence profile...
September 1, 2016: Bioinformatics
Ritambhara Singh, Jack Lanchantin, Gabriel Robins, Yanjun Qi
MOTIVATION: Histone modifications are among the most important factors that control gene regulation. Computational methods that predict gene expression from histone modification signals are highly desirable for understanding their combinatorial effects in gene regulation. This knowledge can help in developing 'epigenetic drugs' for diseases like cancer. Previous studies for quantifying the relationship between histone modifications and gene expression levels either failed to capture combinatorial effects or relied on multiple methods that separate predictions and combinatorial analysis...
September 1, 2016: Bioinformatics
Charles J Labuzzetta, Margaret L Antonio, Patricia M Watson, Robert C Wilson, Lauren A Laboissonniere, Jeffrey M Trimarchi, Baris Genc, P Hande Ozdinler, Dennis K Watson, Paul E Anderson
MOTIVATION: A central task of bioinformatics is to develop sensitive and specific means of providing medical prognoses from biomarker patterns. Common methods to predict phenotypes in RNA-Seq datasets utilize machine learning algorithms trained via gene expression. Isoforms, however, generated from alternative splicing, may provide a novel and complementary set of transcripts for phenotype prediction. In contrast to gene expression, the number of isoforms increases significantly due to numerous alternative splicing patterns, resulting in a prioritization problem for many machine learning algorithms...
September 1, 2016: Bioinformatics
Chantriolnt-Andreas Kapourani, Guido Sanguinetti
MOTIVATION: DNA methylation is an intensely studied epigenetic mark, yet its functional role is incompletely understood. Attempts to quantitatively associate average DNA methylation to gene expression yield poor correlations outside of the well-understood methylation-switch at CpG islands. RESULTS: Here, we use probabilistic machine learning to extract higher order features associated with the methylation profile across a defined region. These features quantitate precisely notions of shape of a methylation profile, capturing spatial correlations in DNA methylation across genomic regions...
September 1, 2016: Bioinformatics
Domenico Cozzetto, Federico Minneci, Hannah Currant, David T Jones
Predicting protein function has been a major goal of bioinformatics for several decades, and it has gained fresh momentum thanks to recent community-wide blind tests aimed at benchmarking available tools on a genomic scale. Sequence-based predictors, especially those performing homology-based transfers, remain the most popular but increasing understanding of their limitations has stimulated the development of complementary approaches, which mostly exploit machine learning. Here we present FFPred 3, which is intended for assigning Gene Ontology terms to human protein chains, when homology with characterized proteins can provide little aid...
2016: Scientific Reports
Lucas Antón Pastur-Romay, Francisco Cedrón, Alejandro Pazos, Ana Belén Porto-Pazos
Over the past decade, Deep Artificial Neural Networks (DNNs) have become the state-of-the-art algorithms in Machine Learning (ML), speech recognition, computer vision, natural language processing and many other tasks. This was made possible by the advancement in Big Data, Deep Learning (DL) and drastically increased chip processing abilities, especially general-purpose graphical processing units (GPGPUs). All this has created a growing interest in making the most of the potential offered by DNNs in almost every field...
2016: International Journal of Molecular Sciences
Gholam-Hossein Jowkar, Eghbal G Mansoori
Identification of disease genes, using computational methods, is an important issue in biomedical and bioinformatics research. According to observations that diseases with the same or similar phenotype have the same biological characteristics, researchers have tried to identify genes by using machine learning tools. In recent attempts, some semi-supervised learning methods, called positive-unlabeled learning, is used for disease gene identification. In this paper, we present a Perceptron ensemble of graph-based positive-unlabeled learning (PEGPUL) on three types of biological attributes: gene ontologies, protein domains and protein-protein interaction networks...
July 12, 2016: Computational Biology and Chemistry
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"