Read by QxMD icon Read

BioData Mining

Kathleen M Chen, Jie Tan, Gregory P Way, Georgia Doing, Deborah A Hogan, Casey S Greene
Background: Investigators often interpret genome-wide data by analyzing the expression levels of genes within pathways. While this within-pathway analysis is routine, the products of any one pathway can affect the activity of other pathways. Past efforts to identify relationships between biological processes have evaluated overlap in knowledge bases or evaluated changes that occur after specific treatments. Individual experiments can highlight condition-specific pathway-pathway relationships; however, constructing a complete network of such relationships across many conditions requires analyzing results from many studies...
2018: BioData Mining
Catherine Li, Juyon Lee, Jessica Ding, Shuying Sun
Background: The deadly costs of cancer and necessity for an accurate method of early cancer detection have demanded the identification of genetic and epigenetic factors associated with cancer. DNA methylation, an epigenetic event, plays an important role in cancer susceptibility. In this paper, we use DNA methylation and gene expression data integration and pathway analysis to further explore and understand the complex relationship between methylation and gene expression. Results: Through linear modeling and analysis of variance, we obtain genes that show a significant correlation between methylation and gene expression...
2018: BioData Mining
Laura Tipton, Karen T Cuenco, Laurence Huang, Ruth M Greenblatt, Eric Kleerup, Frank Sciurba, Steven R Duncan, Michael P Donahoe, Alison Morris, Elodie Ghedin
Background: Human microbiome studies in clinical settings generally focus on distinguishing the microbiota in health from that in disease at a specific point in time. However, microbiome samples may be associated with disease severity or continuous clinical health indicators that are often assessed at multiple time points. While the temporal data from clinical and microbiome samples may be informative, analysis of this type of data can be problematic for standard statistical methods. Results: To identify associations between microbiota and continuous clinical variables measured repeatedly in two studies of the respiratory tract, we adapted a statistical method, the lasso-penalized generalized linear mixed model (LassoGLMM)...
2018: BioData Mining
Kimberly T To, Rebecca C Fry, David M Reif
Background: The Toxicological Priority Index (ToxPi) is a method for prioritization and profiling of chemicals that integrates data from diverse sources. However, individual data sources ("assays"), such as in vitro bioassays or in vivo study endpoints, often feature sections of missing data, wherein subsets of chemicals have not been tested in all assays. In order to investigate the effects of missing data and recommend solutions, we designed simulation studies around high-throughput screening data generated by the ToxCast and Tox21 programs on chemicals highlighted by the Agency for Toxic Substances and Disease Registry's (ATSDR) Substance Priority List (SPL), which helps prioritize environmental research and remediation resources...
2018: BioData Mining
Ravi Mathur, Daniel Rotroff, Jun Ma, Ali Shojaie, Alison Motsinger-Reif
Background: Gene set analysis is a valuable tool to summarize high-dimensional gene expression data in terms of biologically relevant sets. This is an active area of research and numerous gene set analysis methods have been developed. Despite this popularity, systematic comparative studies have been limited in scope. Methods: In this study we present a semi-synthetic simulation study using real datasets in order to test and compare commonly used methods. Results: A software pipeline, Flexible Algorithm for Novel Gene set Simulation (FANGS) develops simulated data based on a prostate cancer dataset where the KRAS and TGF-β pathways were differentially expressed...
2018: BioData Mining
Enrico Ferrero, Pankaj Agarwal
Developing new drugs continues to be a highly inefficient and costly business. By repurposing an existing compound for a different indication, drug repositioning offers an attractive alternative to traditional drug discovery. Most of these approaches work by matching transcriptional disease signatures to anti-correlated gene expression profiles of drug perturbations. Genome-wide association studies (GWASs) are of great interest to researchers in the pharmaceutical industry because drug programmes with supporting genetic evidence are more likely to successfully progress through the drug discovery pipeline...
2018: BioData Mining
Elizabeth R Piette, Jason H Moore
Background: Machine learning methods and conventions are increasingly employed for the analysis of large, complex biomedical data sets, including genome-wide association studies (GWAS). Reproducibility of machine learning analyses of GWAS can be hampered by biological and statistical factors, particularly so for the investigation of non-additive genetic interactions. Application of traditional cross validation to a GWAS data set may result in poor consistency between the training and testing data set splits due to an imbalance of the interaction genotypes relative to the data as a whole...
2018: BioData Mining
Shefali S Verma, Anastasia Lucas, Xinyuan Zhang, Yogasudha Veturi, Scott Dudek, Binglan Li, Ruowang Li, Ryan Urbanowicz, Jason H Moore, Dokyoon Kim, Marylyn D Ritchie
Background: Machine learning methods have gained popularity and practicality in identifying linear and non-linear effects of variants associated with complex disease/traits. Detection of epistatic interactions still remains a challenge due to the large number of features and relatively small sample size as input, thus leading to the so-called "short fat data" problem. The efficiency of machine learning methods can be increased by limiting the number of input features. Thus, it is very important to perform variable selection before searching for epistasis...
2018: BioData Mining
Juan A Nepomuceno, Alicia Troncoso, Isabel A Nepomuceno-Chamorro, Jesús S Aguilar-Ruiz
Background: Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes functionally coherent. On the other hand, a distance among genes can be defined according to their information stored in Gene Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each pair of genes which establishes their functional similarity...
2018: BioData Mining
Wenyi Qin, Hui Lu
Motivation: Detecting differentially expressed (DE) genes between disease and normal control group is one of the most common analyses in genome-wide transcriptomic data. Since most studies don't have a lot of samples, researchers have used meta-analysis to group different datasets for the same disease. Even then, in many cases the statistical power is still not enough. Taking into account the fact that many diseases share the same disease genes, it is desirable to design a statistical framework that can identify diseases' common and specific DE genes simultaneously to improve the identification power...
2018: BioData Mining
Moshe Sipper, Weixuan Fu, Karuna Ahuja, Jason H Moore
Evolutionary computation (EC) has been widely applied to biological and biomedical data. The practice of EC involves the tuning of many parameters, such as population size, generation count, selection size, and crossover and mutation rates. Through an extensive series of experiments over multiple evolutionary algorithm implementations and 25 problems we show that parameter space tends to be rife with viable parameters, at least for the problems studied herein. We discuss the implications of this finding in practice for the researcher employing EC...
2018: BioData Mining
Eunice Carrasquinha, André Veríssimo, Marta B Lopes, Susana Vinga
Background: Survival analysis is a statistical technique widely used in many fields of science, in particular in the medical area, and which studies the time until an event of interest occurs. Outlier detection in this context has gained great importance due to the fact that the identification of long or short-term survivors may lead to the detection of new prognostic factors. However, the results obtained using different outlier detection methods and residuals are seldom the same and are strongly dependent of the specific Cox proportional hazards model selected...
2018: BioData Mining
Andrej Čopar, Marinka Žitnik, Blaž Zupan
Background: Matrix factorization is a well established pattern discovery tool that has seen numerous applications in biomedical data analytics, such as gene expression co-clustering, patient stratification, and gene-disease association mining. Matrix factorization learns a latent data model that takes a data matrix and transforms it into a latent feature space enabling generalization, noise removal and feature discovery. However, factorization algorithms are numerically intensive, and hence there is a pressing challenge to scale current algorithms to work with large datasets...
2017: BioData Mining
Qiwei Xie, Xi Chen, Hao Deng, Danqian Liu, Yingyu Sun, Xiaojuan Zhou, Yang Yang, Hua Han
Background: In the nervous system, the neurons communicate through synapses. The size, morphology, and connectivity of these synapses are significant in determining the functional properties of the neural network. Therefore, they have always been a major focus of neuroscience research. Two-photon laser scanning microscopy allows the visualization of synaptic structures in vivo, leading to many important findings. However, the identification and quantification of structural imaging data currently rely heavily on manual annotation, a method that is both time-consuming and prone to bias...
2017: BioData Mining
Zhenqiu Liu, Fengzhu Sun, Dermot P McGovern
Background: Feature selection and prediction are the most important tasks for big data mining. The common strategies for feature selection in big data mining are L 1 , SCAD and MC+. However, none of the existing algorithms optimizes L 0 , which penalizes the number of nonzero features directly. Results: In this paper, we develop a novel sparse generalized linear model (GLM) with L 0 approximation for feature selection and prediction with big omics data. The proposed approach approximate the L 0 optimization directly...
2017: BioData Mining
J Grey Monroe, Zachariah A Allen, Paul Tanger, Jack L Mullen, John T Lovell, Brook T Moyers, Darrell Whitley, John K McKay
Background: Recent advances in nucleic acid sequencing technologies have led to a dramatic increase in the number of markers available to generate genetic linkage maps. This increased marker density can be used to improve genome assemblies as well as add much needed resolution for loci controlling variation in ecologically and agriculturally important traits. However, traditional genetic map construction methods from these large marker datasets can be computationally prohibitive and highly error prone...
2017: BioData Mining
Luluah Alhusain, Alaaeldin M Hafez
Background: Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Advances in DNA sequencing technology have facilitated the obtainment of genetic datasets with exceptional sizes...
2017: BioData Mining
Randal S Olson, William La Cava, Patryk Orzechowski, Ryan J Urbanowicz, Jason H Moore
Background: The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists...
2017: BioData Mining
Davide Chicco
Machine learning has become a pivotal tool for many projects in computational biology, bioinformatics, and health informatics. Nevertheless, beginners and biomedical researchers often do not have enough experience to run a data mining project effectively, and therefore can follow incorrect practices, that may lead to common mistakes or over-optimistic results. With this review, we present ten quick tips to take advantage of machine learning in any computational biology context, by avoiding some common errors that we observed hundreds of times in multiple bioinformatics projects...
2017: BioData Mining
Moshe Sipper, Jason H Moore
No abstract text is available yet for this article.
2017: BioData Mining
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"