Read by QxMD icon Read

bioinformatics using machine learning

Brian J Beliveau, Jocelyn Y Kishi, Guy Nir, Hiroshi M Sasaki, Sinem K Saka, Son C Nguyen, Chao-Ting Wu, Peng Yin
Oligonucleotide (oligo)-based FISH has emerged as an important tool for the study of chromosome organization and gene expression and has been empowered by the commercial availability of highly complex pools of oligos. However, a dedicated bioinformatic design utility has yet to be created specifically for the purpose of identifying optimal oligo FISH probe sequences on the genome-wide scale. Here, we introduce OligoMiner, a rapid and robust computational pipeline for the genome-scale design of oligo FISH probes that affords the scientist exact control over the parameters of each probe...
February 20, 2018: Proceedings of the National Academy of Sciences of the United States of America
Gary J R Cook, Gurdip Azad, Kasia Owczarczyk, Musib Siddique, Vicky Goh
PURPOSE: Radiomics describes the extraction of multiple, otherwise invisible, features from medical images that, with bioinformatic approaches, can be used to provide additional information that can predict underlying tumor biology and behavior. METHODS AND MATERIALS: Radiomic signatures can be used alone or with other patient-specific data to improve tumor phenotyping, treatment response prediction, and prognosis, noninvasively. The data describing 18F-fluorodeoxyglucose positron emission tomography radiomics, often using texture or heterogeneity parameters, are increasing rapidly...
January 30, 2018: International Journal of Radiation Oncology, Biology, Physics
Irina M Armean, Kathryn S Lilley, Matthew W B Trotter, Nicholas C V Pilkington, Sean B Holden
Motivation: Protein-protein interactions (PPI) play a crucial role in our understanding of protein function and biological processes. The standardization and recording of experimental findings is increasingly stored in ontologies, with the Gene Ontology (GO) being one of the most successful projects. Several PPI evaluation algorithms have been based on the application of probabilistic frameworks or machine learning algorithms to GO properties. Here, we introduce a new training set design and machine learning based approach that combines dependent heterogeneous protein annotations from the entire ontology to evaluate putative co-complex protein interactions determined by empirical studies...
January 30, 2018: Bioinformatics
Rong Tang, Lizhi Ouyang, Clara Li, Yue He, Molly Griffin, Alphonse Taghian, Barbara Smith, Adam Yala, Regina Barzilay, Kevin Hughes
INTRODUCTION: Large structured databases of pathology findings are valuable in deriving new clinical insights. However, they are labor intensive to create and generally require manual annotation. There has been some work in the bioinformatics community to support automating this work via machine learning in English. Our contribution is to provide an automated approach to construct such structured databases in Chinese, and to set the stage for extraction from other languages. METHODS: We collected 2104 de-identified Chinese benign and malignant breast pathology reports from Hunan Cancer Hospital...
January 29, 2018: Breast Cancer Research and Treatment
Xin Wang, Peijie Lin, Joshua W K Ho
BACKGROUND: It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs - a motif grammar - located at the TF binding sites in different cell types...
January 19, 2018: BMC Genomics
Jennifer M Franks, Guoshuai Cai, Michael L Whitfield
Motivation: Molecular subtypes of cancers and autoimmune disease, defined by transcriptomic profiling, have provided insight into disease pathogenesis, molecular heterogeneity, and therapeutic responses. However, technical biases inherent to different gene expression profiling platforms present a unique problem when analyzing data generated from different studies. Currently, there is a lack of effective methods designed to eliminate platform-based bias. We present a method to normalize and classify RNA-seq data using machine learning classifiers trained on DNA microarray data and molecular subtypes in two datasets: breast invasive carcinoma (BRCA) and colorectal cancer (CRC)...
January 17, 2018: Bioinformatics
Natalie Damaso, Julian Mendel, Maria Mendoza, Eric J von Wettberg, Giri Narasimhan, DeEtta Mills
Soil DNA profiling has potential as a forensic tool to establish a link between soil collected at a crime scene and soil recovered from a suspect. However, a quantitative measure is needed to investigate the spatial/temporal variability across multiple scales prior to their application in forensic science. In this study, soil DNA profiles across Miami-Dade, FL, were generated using length heterogeneity PCR to target four taxa. The objectives of this study were to (i) assess the biogeographical patterns of soils to determine whether soil biota is spatially correlated with geographic location and (ii) evaluate five machine learning algorithms for their predictive ability to recognize biotic patterns which could accurately classify soils at different spatial scales regardless of seasonal collection...
January 22, 2018: Journal of Forensic Sciences
Jiawen Chen, Bo Wang, Yinghao Wu
Domains that belong to immunoglobulin (Ig) fold are extremely abundant in cell surface receptors, which play significant roles in cell-cell adhesion and signaling. Although the structures of domains in Ig fold share common topology of β-barrels, functions of receptors in adhesion and signaling are regulated by the very heterogeneous binding between these domains. Additionally, only a small number of domains are directly involved in the binding between two multi-domain receptors. It is challenging and time-consuming to experimentally detect the binding partners of a given receptor, and further determine which specific domains in this receptor are responsible for binding...
January 22, 2018: Journal of Chemical Information and Modeling
Neel S Madhukar, Olivier Elemento
Fulfilling the promises of precision medicine will depend on our ability to create patient-specific treatment regimens. Therefore, being able to translate genomic sequencing into predicting how a patient will respond to a given drug is critical. In this chapter, we review common bioinformatics approaches that aim to use sequencing data to predict sample-specific drug susceptibility. First, we explain the importance of customized drug regimens to the future of medical care. Second, we discuss the different public databases and community efforts that can be leveraged to develop new methods for identifying new predictive biomarkers...
2018: Methods in Molecular Biology
Juhua Zhang, Wenbo Peng, Lei Wang
Motivation: Nucleosome positioning plays significant roles in proper genome packing and its accessibility to execute transcription regulation. Despite a multitude of nucleosome positioning resources available on line including experimental datasets of genome-wide nucleosome occupancy profiles and computational tools to the analysis on these data, the complex language of eukaryotic Nucleosome positioning remains incompletely understood. Results: Here, we address this challenge using an approach based on a state-of-the-art machine learning method...
January 10, 2018: Bioinformatics
Miroslava Cuperlovic-Culf
Machine learning uses experimental data to optimize clustering or classification of samples or features, or to develop, augment or verify models that can be used to predict behavior or properties of systems. It is expected that machine learning will help provide actionable knowledge from a variety of big data including metabolomics data, as well as results of metabolism models. A variety of machine learning methods has been applied in bioinformatics and metabolism analyses including self-organizing maps, support vector machines, the kernel machine, Bayesian networks or fuzzy logic...
January 11, 2018: Metabolites
Trygve Bakken, Lindsay Cowell, Brian D Aevermann, Mark Novotny, Rebecca Hodge, Jeremy A Miller, Alexandra Lee, Ivan Chang, Jamison McCorrison, Bali Pulendran, Yu Qian, Nicholas J Schork, Roger S Lasken, Ed S Lein, Richard H Scheuermann
BACKGROUND: A fundamental characteristic of multicellular organisms is the specialization of functional cell types through the process of differentiation. These specialized cell types not only characterize the normal functioning of different organs and tissues, they can also be used as cellular biomarkers of a variety of different disease states and therapeutic/vaccine responses. In order to serve as a reference for cell type representation, the Cell Ontology has been developed to provide a standard nomenclature of defined cell types for comparative analysis and biomarker discovery...
December 21, 2017: BMC Bioinformatics
Yao Huang, Jie Zhu, Wenshuai Li, Ziqiang Zhang, Panpan Xiong, Hong Wang, Jun Zhang
Early detection of gastric cancer (GC) is crucial to improve the therapeutic effect and prolong the survival of patients. MicroRNAs (miRNAs) are a group of small non-protein-coding RNAs that function as repressors of diverse genes. We aimed to identify a microRNA panel in the serum of patients to predict GC non-invasively with high accuracy and sensitivity. Using six types of classifiers, we selected three markers (miR‑21-5p, miR-22-3p and miR-29c-3p) from a published miRNA profiling study (GSE23739) which was treated as a training set...
December 19, 2017: Oncology Reports
Yuliang Pan, Zixiang Wang, Weihua Zhan, Lei Deng
Motivation: Identifying RNA-binding residues, especially energetically favored hot spots, can provide valuable clues for understanding the mechanisms and functional importance of protein-RNA interactions. Yet limited availability of experimentally recognized energy hot spots in protein-RNA crystal structures lead to difficulties in developing empirical identification approaches. Computational prediction of RNA-binding hot spot residues is still in its infant stage. Results: Here, we describe a computational method, PrabHot (Prediction of protein-RNA binding hot spots), that can effectively detect hot spot residues on protein-RNA binding interfaces using an ensemble of conceptually different machine learning classifiers...
December 21, 2017: Bioinformatics
Yasset Perez-Riverol, Max Kuhn, Juan Antonio Vizcaíno, Marc-Phillip Hitz, Enrique Audain
We are moving into the age of 'Big Data' in biomedical research and bioinformatics. This trend could be encapsulated in this simple formula: D = S * F, where the volume of data generated (D) increases in both dimensions: the number of samples (S) and the number of sample features (F). Frequently, a typical omics classification includes redundant and irrelevant features (e.g. genes or proteins) that can result in long computation times; decrease of the model performance and the selection of suboptimal features (genes and proteins) after the classification/regression step...
2017: PloS One
Xiuquan Du, Changlin Hu, Yu Yao, Shiwei Sun, Yanping Zhang
In bioinformatics, exon skipping (ES) event prediction is an essential part of alternative splicing (AS) event analysis. Although many methods have been developed to predict ES events, a solution has yet to be found. In this study, given the limitations of machine learning algorithms with RNA-Seq data or genome sequences, a new feature, called RS (RNA-seq and sequence) features, was constructed. These features include RNA-Seq features derived from the RNA-Seq data and sequence features derived from genome sequences...
December 12, 2017: International Journal of Molecular Sciences
Badri Adhikari, Jie Hou, Jianlin Cheng
Motivation: Significant improvements in the prediction of protein residue-residue contacts are observed in the recent years. These contacts, predicted using a variety of coevolution-based and machine learning methods, are the key contributors to the recent progress in ab initio protein structure prediction, as demonstrated in the recent CASP experiments. Continuing the development of new methods to reliably predict contact maps is essential to further improve ab initio structure prediction...
December 8, 2017: Bioinformatics
Xiaoyun Xing, Bo Zhang, Daofeng Li, Ting Wang
Understanding the role of DNA methylation often requires accurate assessment and comparison of these modifications in a genome-wide fashion. Sequencing-based DNA methylation profiling provides an unprecedented opportunity to map and compare complete DNA CpG methylomes. These include whole genome bisulfite sequencing (WGBS), Reduced-Representation Bisulfite-Sequencing (RRBS), and enrichment-based methods such as MeDIP-seq, MBD-seq, and MRE-seq. An investigator needs a method that is flexible with the quantity of input DNA, provides the appropriate balance among genomic CpG coverage, resolution, quantitative accuracy, and cost, and comes with robust bioinformatics software for analyzing the data...
2018: Methods in Molecular Biology
Randal S Olson, William La Cava, Zairah Mustahsan, Akshay Varik, Jason H Moore
As the bioinformatics field grows, it must keep pace not only with new data but with new algorithms. Here we contribute a thorough analysis of 13 state-of-the-art, commonly used machine learning algorithms on a set of 165 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. We present a number of statistical and visual comparisons of algorithm performance and quantify the effect of model selection and algorithm tuning for each algorithm and dataset...
2018: Pacific Symposium on Biocomputing
Yuxin Lin, Fuliang Qian, Li Shen, Feifei Chen, Jiajia Chen, Bairong Shen
Biomarkers are a class of measurable and evaluable indicators with the potential to predict disease initiation and progression. In contrast to disease-associated factors, biomarkers hold the promise to capture the changeable signatures of biological states. With methodological advances, computer-aided biomarker discovery has now become a burgeoning paradigm in the field of biomedical science. In recent years, the 'big data' term has accumulated for the systematical investigation of complex biological phenomena and promoted the flourishing of computational methods for systems-level biomarker screening...
November 29, 2017: Briefings in Bioinformatics
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"