keyword
MENU ▼
Read by QxMD icon Read
search

bioinformatics using machine learning

keyword
https://www.readbyqxmd.com/read/28449114/neuro-symbolic-representation-learning-on-biological-knowledge-graphs
#1
Mona Alshahrani, Mohammed Asif Khan, Omar Maddouri, Akira R Kinjo, Núria Queralt-Rosinach, Robert Hoehndorf
Motivation: Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. In the past years, feature learning methods that are applicable to graph-structured data are becoming available, but have not yet widely been applied and evaluated on structured biological knowledge. Results: We develop a novel method for feature learning on biological knowledge graphs...
April 25, 2017: Bioinformatics
https://www.readbyqxmd.com/read/28444127/hla-class-i-binding-prediction-via-convolutional-neural-networks
#2
Yeeleng S Vang, Xiaohui Xie
Motivation: Many biological processes are governed by protein-ligand interactions. One such example is the recognition of self and nonself cells by the immune system. This immune response process is regulated by the major histocompatibility complex (MHC) protein which is encoded by the human leukocyte antigen (HLA) complex. Understanding the binding potential between MHC and peptides can lead to the design of more potent, peptide-based vaccines and immunotherapies for infectious autoimmune diseases...
April 21, 2017: Bioinformatics
https://www.readbyqxmd.com/read/28430949/capturing-non-local-interactions-by-long-short-term-memory-bidirectional-recurrent-neural-networks-for-improving-prediction-of-protein-secondary-structure-backbone-angles-contact-numbers-and-solvent-accessibility
#3
Rhys Heffernan, Yuedong Yang, Kuldip Paliwal, Yaoqi Zhou
Motivation: The accuracy of predicting protein local and global structural properties such as secondary structure and solvent accessible surface area has been stagnant for many years because of the challenge of accounting for non-local interactions between amino acid residues that are close in three-dimensional structural space but far from each other in their sequence positions. All existing machine-learning techniques relied on a sliding window of 10-20 amino acid residues to capture some "short to intermediate" non-local interactions...
April 18, 2017: Bioinformatics
https://www.readbyqxmd.com/read/28398465/machine-learning-in-computational-biology-to-accelerate-high-throughput-protein-expression
#4
Anand Sastry, Jonathan Monk, Hanna Tegel, Mathias Uhlén, Bernhard O Palsson, Johan Rockberg, Elizabeth Brunk
Motivation: The Human Protein Atlas (HPA) enables the simultaneous characterization of thousands of proteins across various tissues to pinpoint their spatial location in the human body. This has been achieved through transcriptomics and high-throughput immunohistochemistry-based approaches, where over 40,000 unique human protein fragments have been expressed in E. coli. These datasets enable quantitative tracking of entire cellular proteomes and present new avenues for understanding molecularlevel properties influencing expression and solubility...
April 7, 2017: Bioinformatics
https://www.readbyqxmd.com/read/28391206/multiple-swarm-ensembles-improving-the-predictive-power-and-robustness-of-predictive-models-and-its-use-in-computational-biology
#5
Pedro Alves, Shuang Liu, Daifeng Wang, Mark Gerstein
Machine learning is an integral part of computational biology, and has already shown its use in various applications, such as prognostic tests. In the last few years in the non-biological machine learning community, ensembling techniques have shown their power in data mining competitions such as the Netflix challenge; however, such methods have not found wide use in computational biology. In this work we endeavor to show how ensembling techniques can be applied to practical problems, including problems in the field of bioinformatics, and how they often outperform other machine learning techniques in both predictive power and robustness...
April 5, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
https://www.readbyqxmd.com/read/28361684/nearender-an-r-package-for-functional-interpretation-of-omics-data-via-network-enrichment-analysis
#6
Ashwini Jeggari, Andrey Alexeyenko
BACKGROUND: The statistical evaluation of pathway enrichment, i.e. of gene profiles' confluence to the pathway level, allows exploring molecular landscapes using functionally annotated gene sets. However, pathway scores can also be used as predictive features in machine learning. That requires, firstly, increasing statistical power and biological relevance via a network enrichment analysis (NEA) and, secondly, a fast and convenient procedure for rendering the original data into a space of pathway scores...
March 23, 2017: BMC Bioinformatics
https://www.readbyqxmd.com/read/28351701/extracting-features-from-protein-sequences-to-improve-deep-extreme-learning-machine-for-protein-fold-recognition
#7
Wisam Ibrahim, Mohammad Saniee Abadeh
Protein fold recognition is an important problem in bioinformatics to predict three-dimensional structure of a protein. One of the most challenging tasks in protein fold recognition problem is the extraction of efficient features from the amino-acid sequences to obtain better classifiers. In this paper, we have proposed six descriptors to extract features from protein sequences. These descriptors are applied in the first stage of a three-stage framework PCA-DELM-LDA to extract feature vectors from the amino-acid sequences...
March 27, 2017: Journal of Theoretical Biology
https://www.readbyqxmd.com/read/28341746/leveraging-sequence-based-faecal-microbial-community-survey-data-to-identify-a-composite-biomarker-for-colorectal-cancer
#8
Manasi S Shah, Todd Z DeSantis, Thomas Weinmaier, Paul J McMurdie, Julia L Cope, Adam Altrichter, Jose-Miguel Yamal, Emily B Hollister
OBJECTIVE: Colorectal cancer (CRC) is the second leading cause of cancer-associated mortality in the USA. The faecal microbiome may provide non-invasive biomarkers of CRC and indicate transition in the adenoma-carcinoma sequence. Re-analysing raw sequence and metadata from several studies uniformly, we sought to identify a composite and generalisable microbial marker for CRC. DESIGN: Raw 16S rRNA gene sequence data sets from nine studies were processed with two pipelines, (1) QIIME closed reference (QIIME-CR) or (2) a strain-specific method herein termed SS-UP (Strain Select, UPARSE bioinformatics pipeline)...
March 24, 2017: Gut
https://www.readbyqxmd.com/read/28335739/identification-of-long-non-coding-transcripts-with-feature-selection-a-comparative-study
#9
Giovanna M M Ventola, Teresa M R Noviello, Salvatore D'Aniello, Antonietta Spagnuolo, Michele Ceccarelli, Luigi Cerulo
BACKGROUND: The unveiling of long non-coding RNAs as important gene regulators in many biological contexts has increased the demand for efficient and robust computational methods to identify novel long non-coding RNAs from transcripts assembled with high throughput RNA-seq data. Several classes of sequence-based features have been proposed to distinguish between coding and non-coding transcripts. Among them, open reading frame, conservation scores, nucleotide arrangements, and RNA secondary structure have been used with success in literature to recognize intergenic long non-coding RNAs, a particular subclass of non-coding RNAs...
March 23, 2017: BMC Bioinformatics
https://www.readbyqxmd.com/read/28315224/an-overview-of-bioinformatics-tools-and-resources-in-allergy
#10
Zhiyan Fu, Jing Lin
The rapidly increasing number of characterized allergens has created huge demands for advanced information storage, retrieval, and analysis. Bioinformatics and machine learning approaches provide useful tools for the study of allergens and epitopes prediction, which greatly complement traditional laboratory techniques. The specific applications mainly include identification of B- and T-cell epitopes, and assessment of allergenicity and cross-reactivity. In order to facilitate the work of clinical and basic researchers who are not familiar with bioinformatics, we review in this chapter the most important databases, bioinformatic tools, and methods with relevance to the study of allergens...
2017: Methods in Molecular Biology
https://www.readbyqxmd.com/read/28296577/genome-wide-identification-and-characterization-of-small-rnas-in-rhodobacter-capsulatus-and-identification-of-small-rnas-affected-by-loss-of-the-response-regulator-ctra
#11
Marc P Grüll, Lourdes Peña-Castillo, Martin E Mulligan, Andrew S Lang
Small non-coding RNAs (sRNAs) are involved in the control of numerous cellular processes through various regulatory mechanisms, and in the past decade many studies have identified sRNAs in a multitude of bacterial species using RNA sequencing (RNA-seq). Here, we present the first genome-wide analysis of sRNA sequencing data in Rhodobacter capsulatus, a purple nonsulfur photosynthetic alphaproteobacterium. Using a recently developed bioinformatics approach, sRNA-Detect, we detected 422 putative sRNAs from R...
March 15, 2017: RNA Biology
https://www.readbyqxmd.com/read/28198674/sequence-specific-bias-correction-for-rna-seq-data-using-recurrent-neural-networks
#12
Yao-Zhong Zhang, Rui Yamaguchi, Seiya Imoto, Satoru Miyano
BACKGROUND: The recent success of deep learning techniques in machine learning and artificial intelligence has stimulated a great deal of interest among bioinformaticians, who now wish to bring the power of deep learning to bare on a host of bioinformatical problems. Deep learning is ideally suited for biological problems that require automatic or hierarchical feature representation for biological data when prior knowledge is limited. In this work, we address the sequence-specific bias correction problem for RNA-seq data redusing Recurrent Neural Networks (RNNs) to model nucleotide sequences without pre-determining sequence structures...
January 25, 2017: BMC Genomics
https://www.readbyqxmd.com/read/28179914/cell-cycle-and-cell-size-dependent-gene-expression-reveals-distinct-subpopulations-at-single-cell-level
#13
Soheila Dolatabadi, Julián Candia, Nina Akrap, Christoffer Vannas, Tajana Tesan Tomic, Wolfgang Losert, Göran Landberg, Pierre Åman, Anders Ståhlberg
Cell proliferation includes a series of events that is tightly regulated by several checkpoints and layers of control mechanisms. Most studies have been performed on large cell populations, but detailed understanding of cell dynamics and heterogeneity requires single-cell analysis. Here, we used quantitative real-time PCR, profiling the expression of 93 genes in single-cells from three different cell lines. Individual unsynchronized cells from three different cell lines were collected in different cell cycle phases (G0/G1 - S - G2/M) with variable cell sizes...
2017: Frontiers in Genetics
https://www.readbyqxmd.com/read/28157153/enhancing-the-biological-relevance-of-machine-learning-classifiers-for-reverse-vaccinology
#14
Ashley I Heinson, Yawwani Gunawardana, Bastiaan Moesker, Carmen C Denman Hume, Elena Vataga, Yper Hall, Elena Stylianou, Helen McShane, Ann Williams, Mahesan Niranjan, Christopher H Woelk
Reverse vaccinology (RV) is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitidis serogroup B. RV studies have begun to incorporate machine learning (ML) techniques to distinguish bacterial protective antigens (BPAs) from non-BPAs. This research contributes significantly to the RV field by using permutation analysis to demonstrate that a signal for protective antigens can be curated from published data...
February 1, 2017: International Journal of Molecular Sciences
https://www.readbyqxmd.com/read/28137713/hipred-an-integrative-approach-to-predicting-haploinsufficient-genes
#15
Hashem A Shihab, Mark F Rogers, Colin Campbell, Tom R Gaunt
MOTIVATION: A major cause of autosomal dominant disease is haploinsufficiency, whereby a single copy of a gene is not sufficient to maintain the normal function of the gene. A large proportion of existing methods for predicting haploinsufficiency incorporate biological networks, e.g. protein-protein interaction networks, that have recently been shown to introduce study bias. As a result, these methods tend to perform best on well studied genes, but underperform on less studied genes. The advent of large genome sequencing consortia, such as the 1,000 genomes project, NHLBI Exome Sequencing Project (ESP) and the Exome Aggregation Consortium (ExAC) creates an urgent need for unbiased haploinsufficiency prediction methods...
January 30, 2017: Bioinformatics
https://www.readbyqxmd.com/read/28073761/seeing-the-trees-through-the-forest-sequence-based-homo-and-heteromeric-protein-protein-interaction-sites-prediction-using-random-forest
#16
Qingzhen Hou, Paul De Geest, Wim F Vranken, Jaap Heringa, K Anton Feenstra
MOTIVATION: Genome sequencing is producing an ever-increasing amount of associated protein sequences. Few of these sequences have experimentally validated annotations, however, and computational predictions are becoming increasingly successful in producing such annotations. One key challenge remains the prediction of the amino acids in a given protein sequence that are involved in proteinprotein interactions. Such predictions are typically based on machine learning methods that take advantage of the properties and sequence positions of amino acids that are known to be involved in interaction...
January 10, 2017: Bioinformatics
https://www.readbyqxmd.com/read/28064045/a-l1-regularized-feature-selection-method-for-local-dimension-reduction-on-microarray-data
#17
Shun Guo, Donghui Guo, Lifei Chen, Qingshan Jiang
Dimension reduction is a crucial technique in machine learning and data mining, which is widely used in areas of medicine, bioinformatics and genetics. In this paper, we propose a two-stage local dimension reduction approach for classification on microarray data. In first stage, a new L1-regularized feature selection method is defined to remove irrelevant and redundant features and to select the important features (biomarkers). In the next stage, PLS-based feature extraction is implemented on the selected features to extract synthesis features that best reflect discriminating characteristics for classification...
December 31, 2016: Computational Biology and Chemistry
https://www.readbyqxmd.com/read/28052925/proq3d-improved-model-quality-assessments-using-deep-learning
#18
Karolis Uziela, David Menéndez Hurtado, Nanjiang Shu, Björn Wallner, Arne Elofsson
Protein quality assessment is a long-standing problem in bioinformatics. For more than a decade we have developed state-of-art predictors by carefully selecting and optimising inputs to a machine learning method. The correlation has increased from 0.60 in ProQ to 0.81 in ProQ2 and 0.85 in ProQ3 mainly by adding a large set of carefully tuned descriptions of a protein. Here, we show that a substantial improvement can be obtained using exactly the same inputs as in ProQ2 or ProQ3 but replacing the support vector machine by a deep neural network...
January 3, 2017: Bioinformatics
https://www.readbyqxmd.com/read/28046236/su-d-204-06-integration-of-machine-learning-and-bioinformatics-methods-to-analyze-genome-wide-association-study-data-for-rectal-bleeding-and-erectile-dysfunction-following-radiotherapy-in-prostate-cancer
#19
J Oh, S Kerns, H Ostrer, B Rosenstein, J Deasy
PURPOSE: We investigated whether integration of machine learning and bioinformatics techniques on genome-wide association study (GWAS) data can improve the performance of predictive models in predicting the risk of developing radiation-induced late rectal bleeding and erectile dysfunction in prostate cancer patients. METHODS: We analyzed a GWAS dataset generated from 385 prostate cancer patients treated with radiotherapy. Using genotype information from these patients, we designed a machine learning-based predictive model of late radiation-induced toxicities: rectal bleeding and erectile dysfunction...
June 2016: Medical Physics
https://www.readbyqxmd.com/read/28040499/machine-learned-cluster-identification-in-high-dimensional-data
#20
Alfred Ultsch, Jörn Lötsch
BACKGROUND: High-dimensional biomedical data are frequently clustered to identify subgroup structures pointing at distinct disease subtypes. It is crucial that the used cluster algorithm works correctly. However, by imposing a predefined shape on the clusters, classical algorithms occasionally suggest a cluster structure in homogenously distributed data or assign data points to incorrect clusters. We analyzed whether this can be avoided by using emergent self-organizing feature maps (ESOM)...
February 2017: Journal of Biomedical Informatics
keyword
keyword
45747
1
2
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read
×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"