Read by QxMD icon Read

bioinformatics using machine learning

Qingzhen Hou, Paul De Geest, Wim F Vranken, Jaap Heringa, K Anton Feenstra
MOTIVATION: Genome sequencing is producing an ever-increasing amount of associated protein sequences. Few of these sequences have experimentally validated annotations, however, and computational predictions are becoming increasingly successful in producing such annotations. One key challenge remains the prediction of the amino acids in a given protein sequence that are involved in proteinprotein interactions. Such predictions are typically based on machine learning methods that take advantage of the properties and sequence positions of amino acids that are known to be involved in interaction...
January 10, 2017: Bioinformatics
Shun Guo, Donghui Guo, Lifei Chen, Qingshan Jiang
Dimension reduction is a crucial technique in machine learning and data mining, which is widely used in areas of medicine, bioinformatics and genetics. In this paper, we propose a two-stage local dimension reduction approach for classification on microarray data. In first stage, a new L1-regularized feature selection method is defined to remove irrelevant and redundant features and to select the important features (biomarkers). In the next stage, PLS-based feature extraction is implemented on the selected features to extract synthesis features that best reflect discriminating characteristics for classification...
December 31, 2016: Computational Biology and Chemistry
Karolis Uziela, David Menéndez Hurtado, Nanjiang Shu, Björn Wallner, Arne Elofsson
: Protein quality assessment is a long-standing problem in bioinformatics. For more than a decade we have developed state-of-art predictors by carefully selecting and optimising inputs to a machine learning method. The correlation has increased from 0.60 in ProQ to 0.81 in ProQ2 and 0.85 in ProQ3 mainly by adding a large set of carefully tuned descriptions of a protein. Here, we show that a substantial improvement can be obtained using exactly the same inputs as in ProQ2 or ProQ3 but replacing the support vector machine by a deep neural network...
January 3, 2017: Bioinformatics
J Oh, S Kerns, H Ostrer, B Rosenstein, J Deasy
PURPOSE: We investigated whether integration of machine learning and bioinformatics techniques on genome-wide association study (GWAS) data can improve the performance of predictive models in predicting the risk of developing radiation-induced late rectal bleeding and erectile dysfunction in prostate cancer patients. METHODS: We analyzed a GWAS dataset generated from 385 prostate cancer patients treated with radiotherapy. Using genotype information from these patients, we designed a machine learning-based predictive model of late radiation-induced toxicities: rectal bleeding and erectile dysfunction...
June 2016: Medical Physics
Alfred Ultsch, Jörn Lötsch
BACKGROUND: High-dimensional biomedical data are frequently clustered to identify subgroup structures pointing at distinct disease subtypes. It is crucial, that the used cluster algorithm works correctly. However, by imposing a predefined shape on the clusters, classical algorithms occasionally suggest a cluster structure in homogenously distributed data or assign data points to incorrect clusters. We analyzed whether this can be avoided when using Emergent Self-organizing feature maps (ESOM)...
December 28, 2016: Journal of Biomedical Informatics
Jiaqi Xia, Zhenling Peng, Dawei Qi, Hongbo Mu, Jianyi Yang
MOTIVATION: Protein fold classification is a critical step in protein structure prediction. There are two possible ways to classify protein folds. One is through template-based fold assignment and the other is ab-initio prediction using machine learning algorithms. Combination of both solutions to improve the prediction accuracy was never explored before. RESULTS: We developed two algorithms, HH-fold and SVM-fold for protein fold classification. HH-fold is a template-based fold assignment algorithm using the HHsearch program...
December 30, 2016: Bioinformatics
Renzhi Cao, Badri Adhikari, Debswapna Bhattacharya, Miao Sun, Jie Hou, Jianlin Cheng
MOTIVATION: Protein model quality assessment (QA) plays a very important role in protein structure prediction. It can be divided into two groups of methods: single model and consensus QA method. The consensus QA methods may fail when there is a large portion of low quality models in the model pool. RESULTS: In this paper, we develop a novel single-model quality assessment method QAcon utilizing structural features, physicochemical properties, and residue contact predictions...
December 28, 2016: Bioinformatics
Samantha Cheng, Angeline S Andrew, Peter C Andrews, Jason H Moore
BACKGROUND: Bladder cancer is common disease with a complex etiology that is likely due to many different genetic and environmental factors. The goal of this study was to embrace this complexity using a bioinformatics analysis pipeline designed to use machine learning to measure synergistic interactions between single nucleotide polymorphisms (SNPs) in two genome-wide association studies (GWAS) and then to assess their enrichment within functional groups defined by Gene Ontology. The significance of the results was evaluated using permutation testing and those results that replicated between the two GWAS data sets were reported...
2016: BioData Mining
Leyi Wei, Quan Zou
Knowledge on protein folding has a profound impact on understanding the heterogeneity and molecular function of proteins, further facilitating drug design. Predicting the 3D structure (fold) of a protein is a key problem in molecular biology. Determination of the fold of a protein mainly relies on molecular experimental methods. With the development of next-generation sequencing techniques, the discovery of new protein sequences has been rapidly increasing. With such a great number of proteins, the use of experimental techniques to determine protein folding is extremely difficult because these techniques are time consuming and expensive...
December 16, 2016: International Journal of Molecular Sciences
Jessica S Yu, Dante A Pertusi, Adebola V Adeniran, Keith E J Tyo
MOTIVATION: High throughput screening by fluorescence activated cell sorting (FACS) is a common task in protein engineering and directed evolution. It can also be a rate-limiting step if high false positive or negative rates necessitate multiple rounds of enrichment. Current FACS software requires the user to define sorting gates by intuition and are practically limited to two dimensions. In cases when multiple rounds of enrichment are required, the software cannot forecast the enrichment effort required...
December 20, 2016: Bioinformatics
Hang Zhou, Yang Yang, Hong-Bin Shen
MOTIVATION: Protein subcellular localization prediction has been an important research topic in computational biology over the last decade. Various automatic methods have been proposed to predict locations for large scale protein datasets, where statistical machine learning algorithms are widely used for model construction. A key step in these predictors is encoding the amino acid sequences into feature vectors. Many studies have shown that features extracted from biological domains, such as gene ontology and functional domains, can be very useful for improving the prediction accuracy...
December 19, 2016: Bioinformatics
Tomas Puelma, Viviana Araus, Javier Canales, Elena A Vidal, Juan M Cabello, Alvaro Soto, Rodrigo A Gutiérrez
: GENIUS is a user-friendly web server that uses a novel machine learning algorithm to infer functional gene networks focused on specific genes and experimental conditions that are relevant to biological functions of interest. These functions may have different levels of complexity, from specific biological processes to complex traits that involve several interacting processes. GENIUS also enriches the network with new genes related to the biological function of interest, with accuracies comparable to highly discriminative Support Vector Machine methods...
December 19, 2016: Bioinformatics
Yinyin Cai, Zhijun Liao, Ying Ju, Juan Liu, Yong Mao, Xiangrong Liu
The research on resistance genes (R-gene) plays a vital role in bioinformatics as it has the capability of coping with adverse changes in the external environment, which can form the corresponding resistance protein by transcription and translation. It is meaningful to identify and predict R-gene of Larimichthys crocea (L.Crocea). It is friendly for breeding and the marine environment as well. Large amounts of L.Crocea's immune mechanisms have been explored by biological methods. However, much about them is still unclear...
December 6, 2016: Scientific Reports
Carlos Fernandez-Lozano, Marcos Gestal, Cristian R Munteanu, Julian Dorado, Alejandro Pazos
The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs...
2016: PeerJ
Nikolaos-Kosmas Chlis, Ekaterini S Bei, Michael Zervakis
The application of machine learning methods for the identification of candidate genes responsible for phenotypes of interest, such as cancer, is a major challenge in the field of bioinformatics. These lists of genes are often called genomic signatures and their linkage to phenotype associations may form a significant step in discovering the causation between genotypes and phenotypes. Traditional methods that produce genomic signatures from DNA Microarray data tend to extract significantly different lists under relatively small variations of the training data...
November 29, 2016: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Sacha Beaumeunier, Jérôme Audoux, Anthony Boureux, Florence Ruffle, Thérèse Commes, Nicolas Philippe, Ronnie Alves
BACKGROUND: High-throughput sequencing technology and bioinformatics have identified chimeric RNAs (chRNAs), raising the possibility of chRNAs expressing particularly in diseases can be used as potential biomarkers in both diagnosis and prognosis. RESULTS: The task of discriminating true chRNAs from the false ones poses an interesting Machine Learning (ML) challenge. First of all, the sequencing data may contain false reads due to technical artifacts and during the analysis process, bioinformatics tools may generate false positives due to methodological biases...
2016: BioData Mining
Tolutola Oyetunde, Muhan Zhang, Yixin Chen, Yinjie Tang, Cynthia Lo
MOTIVATION: Metabolic network reconstructions are often incomplete. Constraint-based and pattern-based methodologies have been used for automated gap filling of these networks, each with its own strengths and weaknesses. Moreover, since validation of hypotheses made by gap filling tools require experimentation, it is challenging to benchmark performance and make improvements other than that related to speed and scalability. RESULTS: We present BoostGAPFILL, an open source tool that leverages both constraint-based and machine learning methodologies for hypotheses generation in gap filling and metabolic model refinement...
October 26, 2016: Bioinformatics
Yi An, Jiawei Wang, Chen Li, André Leier, Tatiana Marquez-Lago, Jonathan Wilksch, Yang Zhang, Geoffrey I Webb, Jiangning Song, Trevor Lithgow
Bacterial effector proteins secreted by various protein secretion systems play crucial roles in host-pathogen interactions. In this context, computational tools capable of accurately predicting effector proteins of the various types of bacterial secretion systems are highly desirable. Existing computational approaches use different machine learning (ML) techniques and heterogeneous features derived from protein sequences and/or structural information. These predictors differ not only in terms of the used ML methods but also with respect to the used curated data sets, the features selection and their prediction performance...
October 24, 2016: Briefings in Bioinformatics
Luisa Azevedo, Matthew Mort, Antonio C Costa, Raquel M Silva, Dulce Quelhas, Antonio Amorim, David N Cooper
Understanding the functional sequelae of amino-acid replacements is of fundamental importance in medical genetics. Perhaps, the most intuitive way to assess the potential pathogenicity of a given human missense variant is by measuring the degree of evolutionary conservation of the substituted amino-acid residue, a feature that generally serves as a good proxy metric for the functional/structural importance of that residue. However, the presence of putatively compensated variants as the wild-type alleles in orthologous proteins of other mammalian species not only challenges this classical view of amino-acid essentiality but also precludes the accurate evaluation of the functional impact of this type of missense variant using currently available bioinformatic prediction tools...
January 2016: European Journal of Human Genetics: EJHG
Daniel Sánchez-Taltavull, Parameswaran Ramachandran, Nelson Lau, Theodore J Perkins
Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities' measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities...
2016: PloS One
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"