Read by QxMD icon Read

bioinformatics using machine learning

Edgar Liberis, Petar Velickovic, Pietro Sormanni, Michele Vendruscolo, Pietro Liò
Motivation: Antibodies play essential roles in the immune system of vertebrates and are powerful tools in research and diagnostics. While hypervariable regions of antibodies, which are responsible for binding, can be readily identified from their amino acid sequence, it remains challenging to accurately pinpoint which amino acids will be in contact with the antigen (the paratope). Results: In this work, we present a sequence-based probabilistic machine learning algorithm for paratope prediction, named Parapred...
April 16, 2018: Bioinformatics
Semmy Wellem Taju, Trinh-Trung-Duong Nguyen, Nguyen-Quoc-Khanh Le, Rosdyana Mangir Irawan Kusuma, Yu-Yen Ou
Motivation: Efflux protein plays a key role in pumping xenobiotics out of the cells. The prediction of efflux family proteins involved in transport process of compounds is crucial for understanding family structures, functions and energy dependencies. Many methods have been proposed to classify efflux pump transporters without considerations of any pump-specific of efflux protein families. In other words, efflux proteins protect cells from extrusion of foreign chemicals. Moreover, almost all efflux protein families have the same structure based on the analysis of significant motifs...
April 14, 2018: Bioinformatics
Margherita Francescatto, Marco Chierici, Setareh Rezvan Dezfooli, Alessandro Zandonà, Giuseppe Jurman, Cesare Furlanello
BACKGROUND: High-throughput methodologies such as microarrays and next-generation sequencing are routinely used in cancer research, generating complex data at different omics layers. The effective integration of omics data could provide a broader insight into the mechanisms of cancer biology, helping researchers and clinicians to develop personalized therapies. RESULTS: In the context of CAMDA 2017 Neuroblastoma Data Integration challenge, we explore the use of Integrative Network Fusion (INF), a bioinformatics framework combining a similarity network fusion with machine learning for the integration of multiple omics data...
April 3, 2018: Biology Direct
Daniel Veltri, Uday Kamath, Amarda Shehu
Motivation: Bacterial resistance to antibiotics is a growing concern. Antimicrobial peptides (AMPs), natural components of innate immunity, are popular targets for developing new drugs. Machine learning methods are now commonly adopted by wet-laboratory researchers to screen for promising candidates. Results: In this work we utilize deep learning to recognize antimicrobial activity. We propose a neural network model with convolutional and recurrent layers that leverage primary sequence composition...
March 24, 2018: Bioinformatics
Kevin K Yang, Zachary Wu, Claire N Bedbrook, Frances H Arnold
Motivation: Machine-learning models trained on protein sequences and their measured functions can infer biological properties of unseen sequences without requiring an understanding of the underlying physical or biological mechanisms. Such models enable the prediction and discovery of sequences with optimal properties. Machine-learning models generally require that their inputs be vectors, and the conversion from a protein sequence to a vector representation affects the model's ability to learn...
March 23, 2018: Bioinformatics
Matteo Lo Monte, Candida Manelfi, Marica Gemei, Daniela Corda, Andrea Rosario Beccari
Motivation: ADP-ribosylation is a post-translational modification implicated in several crucial cellular processes, ranging from regulation of DNA repair and chromatin structure to cell metabolism and stress responses. To date, a complete understanding of ADP-ribosylation targets and their modification sites in different tissues and disease states is still lacking. Identification of ADP-ribosylation sites is required to discern the molecular mechanisms regulated by this modification. This motivated us to develop a computational tool for the prediction of ADP-ribosylated sites...
March 15, 2018: Bioinformatics
Dieter Galea, Ivan Laponogov, Kirill Veselkov
Motivation: Recognition of biomedical entities from scientific text is a critical component of natural language processing and automated information extraction platforms. Modern named entity recognition approaches rely heavily on supervised machine learning techniques, which are critically dependent on annotated training corpora. These approaches have been shown to perform well when trained and tested on the same source. However, in such scenario, the performance and evaluation of these models may be optimistic, as such models may not necessarily generalize to independent corpora, resulting in potential non-optimal entity recognition for large-scale tagging of widely diverse articles in databases such as PubMed...
March 10, 2018: Bioinformatics
Zhen Chen, Pei Zhao, Fuyi Li, André Leier, Tatiana T Marquez-Lago, Yanan Wang, Geoffrey I Webb, A Ian Smith, Roger J Daly, Kuo-Chen Chou, Jiangning Song
Summary: Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors...
March 8, 2018: Bioinformatics
Ronghui You, Zihan Zhang, Yi Xiong, Fengzhu Sun, Hiroshi Mamitsuka, Shanfeng Zhu
Motivation: Gene Ontology (GO) has been widely used to annotate functions of proteins and understand their biological roles. Currently only <1% of more than 70 million proteins in UniProtKB have experimental GO annotations, implying the strong necessity of automated function prediction (AFP) of proteins, where AFP is a hard multilabel classification problem due to one protein with a diverse number of GO terms. Most of these proteins have only sequences as input information, indicating the importance of sequence-based AFP (SAFP: sequences are the only input)...
March 7, 2018: Bioinformatics
Huan-Yu Meng, Zhao-Hui Luo, Bo Hu, Wan-Lin Jin, Cheng-Kai Yan, Zhi-Bin Li, Yuan-Yuan Xue, Yu Liu, Yi-En Luo, Li-Qun Xu, Huan Yang
Recent studies have suggested that genomic diversity may play a key role in different clinical outcomes, and the importance of SNPs is becoming increasingly clear. In this article, we summarize the bioactivity of SNPs that may affect the sensitivity to or possibility of drug reactions that occur among the signaling pathways of regularly used immunosuppressants, such as glucocorticoids, azathioprine, tacrolimus, mycophenolate mofetil, cyclophosphamide and methotrexate. The development of bioinformatics, including machine learning models, has enabled prediction of the proper immunosuppressant dosage with minimal adverse drug reactions for patients after organ transplantation or for those with autoimmune diseases...
March 8, 2018: Pharmacogenomics
Kirill Veselkov, Jonathan Sleeman, Emmanuelle Claude, Johannes P C Vissers, Dieter Galea, Anna Mroz, Ivan Laponogov, Mark Towers, Robert Tonge, Reza Mirnezami, Zoltan Takats, Jeremy K Nicholson, James I Langridge
Mass Spectrometry Imaging (MSI) holds significant promise in augmenting digital histopathologic analysis by generating highly robust big data about the metabolic, lipidomic and proteomic molecular content of the samples. In the process, a vast quantity of unrefined data, that can amount to several hundred gigabytes per tissue section, is produced. Managing, analysing and interpreting this data is a significant challenge and represents a major barrier to the translational application of MSI. Existing data analysis solutions for MSI rely on a set of heterogeneous bioinformatics packages that are not scalable for the reproducible processing of large-scale (hundreds to thousands) biological sample sets...
March 6, 2018: Scientific Reports
Sangkyu Lee, Sarah Kerns, Harry Ostrer, Barry Rosenstein, Joseph O Deasy, Jung Hun Oh
PURPOSE: Late genitourinary (GU) toxicity after radiation therapy limits the quality of life of prostate cancer survivors; however, efforts to explain GU toxicity using patient and dose information have remained unsuccessful. We identified patients with a greater congenital GU toxicity risk by identifying and integrating patterns in genome-wide single nucleotide polymorphisms (SNPs). METHODS AND MATERIALS: We applied a preconditioned random forest regression method for predicting risk from the genome-wide data to combine the effects of multiple SNPs and overcome the statistical power limitations of single-SNP analysis...
January 31, 2018: International Journal of Radiation Oncology, Biology, Physics
Mohammad K Ebrahimpour, Hossein Nezamabadi-Pour, Mahdi Eftekhari
Recently, advances in bioinformatics lead to microarray high dimensional datasets. These kinds of datasets are still challenging for researchers in the area of machine learning since they suffer from small sample size and extremely large number of features. Therefore, feature selection is the problem of interest in the learning process in this area. In this paper, a novel feature selection method based on a global search (by using the main concepts of divide and conquer technique) which is called CCFS, is proposed...
February 17, 2018: Computational Biology and Chemistry
Elham Pashaei, Nizamettin Aydin
Splice site recognition is among the most significant and challenging tasks in bioinformatics due to its key role in gene annotation. Effective prediction of splice site requires nucleotide encoding methods that reveal the characteristics of DNA sequences to provide appropriate features to serve as input of machine learning classifiers. Markovian models are the most influential encoding methods that highly used for pattern recognition in biological data. However, a direct performance comparison of these methods in splice site domain has not been assessed yet...
February 14, 2018: Computational Biology and Chemistry
Bin Liu, Yingming Li, Zenglin Xu
Multi-label learning is a common machine learning problem arising from numerous real-world applications in diverse fields, e.g, natural language processing, bioinformatics, information retrieval and so on. Among various multi-label learning methods, the matrix completion approach has been regarded as a promising approach to transductive multi-label learning. By constructing a joint matrix comprising the feature matrix and the label matrix, the missing labels of test samples are regarded as missing values of the joint matrix...
February 14, 2018: Neural Networks: the Official Journal of the International Neural Network Society
Brian J Beliveau, Jocelyn Y Kishi, Guy Nir, Hiroshi M Sasaki, Sinem K Saka, Son C Nguyen, Chao-Ting Wu, Peng Yin
Oligonucleotide (oligo)-based FISH has emerged as an important tool for the study of chromosome organization and gene expression and has been empowered by the commercial availability of highly complex pools of oligos. However, a dedicated bioinformatic design utility has yet to be created specifically for the purpose of identifying optimal oligo FISH probe sequences on the genome-wide scale. Here, we introduce OligoMiner, a rapid and robust computational pipeline for the genome-scale design of oligo FISH probes that affords the scientist exact control over the parameters of each probe...
February 20, 2018: Proceedings of the National Academy of Sciences of the United States of America
Gary J R Cook, Gurdip Azad, Kasia Owczarczyk, Musib Siddique, Vicky Goh
PURPOSE: Radiomics describes the extraction of multiple, otherwise invisible, features from medical images that, with bioinformatic approaches, can be used to provide additional information that can predict underlying tumor biology and behavior. METHODS AND MATERIALS: Radiomic signatures can be used alone or with other patient-specific data to improve tumor phenotyping, treatment response prediction, and prognosis, noninvasively. The data describing 18F-fluorodeoxyglucose positron emission tomography radiomics, often using texture or heterogeneity parameters, are increasing rapidly...
January 30, 2018: International Journal of Radiation Oncology, Biology, Physics
Irina M Armean, Kathryn S Lilley, Matthew W B Trotter, Nicholas C V Pilkington, Sean B Holden
Motivation: Protein-protein interactions (PPI) play a crucial role in our understanding of protein function and biological processes. The standardization and recording of experimental findings is increasingly stored in ontologies, with the Gene Ontology (GO) being one of the most successful projects. Several PPI evaluation algorithms have been based on the application of probabilistic frameworks or machine learning algorithms to GO properties. Here, we introduce a new training set design and machine learning based approach that combines dependent heterogeneous protein annotations from the entire ontology to evaluate putative co-complex protein interactions determined by empirical studies...
January 30, 2018: Bioinformatics
Rong Tang, Lizhi Ouyang, Clara Li, Yue He, Molly Griffin, Alphonse Taghian, Barbara Smith, Adam Yala, Regina Barzilay, Kevin Hughes
INTRODUCTION: Large structured databases of pathology findings are valuable in deriving new clinical insights. However, they are labor intensive to create and generally require manual annotation. There has been some work in the bioinformatics community to support automating this work via machine learning in English. Our contribution is to provide an automated approach to construct such structured databases in Chinese, and to set the stage for extraction from other languages. METHODS: We collected 2104 de-identified Chinese benign and malignant breast pathology reports from Hunan Cancer Hospital...
January 29, 2018: Breast Cancer Research and Treatment
Xin Wang, Peijie Lin, Joshua W K Ho
BACKGROUND: It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs - a motif grammar - located at the TF binding sites in different cell types...
January 19, 2018: BMC Genomics
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"