Read by QxMD icon Read

BioData Mining

Weifu Li, Jing Liu, Chi Xiao, Hao Deng, Qiwei Xie, Hua Han
Background: It is becoming increasingly clear that the quantification of mitochondria and synapses is of great significance to understand the function of biological nervous systems. Electron microscopy (EM), with the necessary resolution in three directions, is the only available imaging method to look closely into these issues. Therefore, estimating the number of mitochondria and synapses from the serial EM images is coming into prominence. Since previous studies have achieved preferable 2D segmentation performance, it holds great promise to obtain the 3D connection relationship from the 2D segmentation results...
2018: BioData Mining
M Arabnejad, B A Dawkins, W S Bush, B C White, A R Harkness, B A McKinney
Background: ReliefF is a nearest-neighbor based feature selection algorithm that efficiently detects variants that are important due to statistical interactions or epistasis. For categorical predictors, like genotypes, the standard metric used in ReliefF has been a simple (binary) mismatch difference. In this study, we develop new metrics of varying complexity that incorporate allele sharing, adjustment for allele frequency heterogeneity via the genetic relationship matrix (GRM), and physicochemical differences of variants via a new transition/transversion encoding...
2018: BioData Mining
Eleonora Cappelli, Giovanni Felici, Emanuel Weitschek
Background: In the Next Generation Sequencing (NGS) era a large amount of biological data is being sequenced, analyzed, and stored in many public databases, whose interoperability is often required to allow an enhanced accessibility. The combination of heterogeneous NGS genomic data is an open challenge: the analysis of data from different experiments is a fundamental practice for the study of diseases. In this work, we propose to combine DNA methylation and RNA sequencing NGS experiments at gene level for supervised knowledge extraction in cancer...
2018: BioData Mining
Moshe Sipper, Ryan J Urbanowicz, Jason H Moore
No abstract text is available yet for this article.
2018: BioData Mining
Aida Mrzic, Pieter Meysman, Wout Bittremieux, Pieter Moris, Boris Cule, Bart Goethals, Kris Laukens
Searching for interesting common subgraphs in graph data is a well-studied problem in data mining. Subgraph mining techniques focus on the discovery of patterns in graphs that exhibit a specific network structure that is deemed interesting within these data sets. The definition of which subgraphs are interesting and which are not is highly dependent on the application. These techniques have seen numerous applications and are able to tackle a range of biological research questions, spanning from the detection of common substructures in sets of biomolecular compounds, to the discovery of network motifs in large-scale molecular interaction networks...
2018: BioData Mining
Jing Zhang, Dan Wu, Yiqin Dai, Jianjiang Xu
Background: The genetic architecture underlying central cornea thickness (CCT) is far from understood. Most of the CCT-associated variants are located in the non-coding regions, raising the difficulty of following functional characterizations. Thus, integrative functional analyses on CCT-associated loci might benefit in overcoming these issues by prioritizing the hub genes that are located in the center of CCT genetic network. Methods: Integrative analyses including functional annotations, enrichment analysis, and protein-protein interaction analyses were performed on all reported CCT GWAS lead SNPs, together with their proxy variants...
2018: BioData Mining
Christina Brester, Jussi Kauhanen, Tomi-Pekka Tuomainen, Sari Voutilainen, Mauno Rönkkö, Kimmo Ronkainen, Eugene Semenkin, Mikko Kolehmainen
Background: The redundancy of information is becoming a critical issue for epidemiologists. High-dimensional datasets require new effective variable selection methods to be developed. This study implements an advanced evolutionary variable selection method which is applied for cardiovascular predictive modeling. The epidemiological follow-up study KIHD (Kuopio Ischemic Heart Disease Risk Factor Study) was used to compare the designed variable selection method based on an evolutionary search with conventional stepwise selection...
2018: BioData Mining
David Gutiérrez-Avilés, Raúl Giráldez, Francisco Javier Gil-Cumbreras, Cristina Rubio-Escudero
Background: Triclustering has shown to be a valuable tool for the analysis of microarray data since its appearance as an improvement of classical clustering and biclustering techniques. The standard for validation of triclustering is based on three different measures: correlation, graphic similarity of the patterns and functional annotations for the genes extracted from the Gene Ontology project (GO). Results: We propose TRIQ , a single evaluation measure that combines the three measures previously described: correlation, graphic validation and functional annotation, providing a single value as result of the validation of a tricluster solution and therefore simplifying the steps inherent to research of comparison and selection of solutions...
2018: BioData Mining
Cheng-Hong Yang, Kuo-Chuan Wu, Yu-Shiun Lin, Li-Yeh Chuang, Hsueh-Wei Chang
Background: The function of a protein is determined by its native protein structure. Among many protein prediction methods, the Hydrophobic-Polar (HP) model, an ab initio method, simplifies the protein folding prediction process in order to reduce the prediction complexity. Results: In this study, the ions motion optimization (IMO) algorithm was combined with the greedy algorithm (namely IMOG) and implemented to the HP model for the protein folding prediction based on the 2D-triangular-lattice model...
2018: BioData Mining
Jorge Parraga-Alava, Marcio Dorn, Mario Inostroza-Ponta
Background: Biologists aim to understand the genetic background of diseases, metabolic disorders or any other genetic condition. Microarrays are one of the main high-throughput technologies for collecting information about the behaviour of genetic information on different conditions. In order to analyse this data, clustering arises as one of the main techniques used, and it aims at finding groups of genes that have some criterion in common, like similar expression profile. However, the problem of finding groups is normally multi dimensional, making necessary to approach the clustering as a multi-objective problem where various cluster validity indexes are simultaneously optimised...
2018: BioData Mining
Jens Dörpinghaus, Sebastian Schaaf, Marc Jacobs
Background: In text mining, document clustering describes the efforts to assign unstructured documents to clusters, which in turn usually refer to topics. Clustering is widely used in science for data retrieval and organisation. Results: In this paper we present and discuss a novel graph-theoretical approach for document clustering and its application on a real-world data set. We will show that the well-known graph partition to stable sets or cliques can be generalized to pseudostable sets or pseudocliques...
2018: BioData Mining
Amani Al-Ajlan, Achraf El Allali
Background: Computational approaches, specifically machine-learning techniques, play an important role in many metagenomic analysis algorithms, such as gene prediction. Due to the large feature space, current de novo gene prediction algorithms use different combinations of classification algorithms to distinguish between coding and non-coding sequences. Results: In this study, we apply a filter method to select relevant features from a large set of known features instead of combining them using linear classifiers or ignoring their individual coding potential...
2018: BioData Mining
Kathleen M Chen, Jie Tan, Gregory P Way, Georgia Doing, Deborah A Hogan, Casey S Greene
Background: Investigators often interpret genome-wide data by analyzing the expression levels of genes within pathways. While this within-pathway analysis is routine, the products of any one pathway can affect the activity of other pathways. Past efforts to identify relationships between biological processes have evaluated overlap in knowledge bases or evaluated changes that occur after specific treatments. Individual experiments can highlight condition-specific pathway-pathway relationships; however, constructing a complete network of such relationships across many conditions requires analyzing results from many studies...
2018: BioData Mining
Catherine Li, Juyon Lee, Jessica Ding, Shuying Sun
Background: The deadly costs of cancer and necessity for an accurate method of early cancer detection have demanded the identification of genetic and epigenetic factors associated with cancer. DNA methylation, an epigenetic event, plays an important role in cancer susceptibility. In this paper, we use DNA methylation and gene expression data integration and pathway analysis to further explore and understand the complex relationship between methylation and gene expression. Results: Through linear modeling and analysis of variance, we obtain genes that show a significant correlation between methylation and gene expression...
2018: BioData Mining
Laura Tipton, Karen T Cuenco, Laurence Huang, Ruth M Greenblatt, Eric Kleerup, Frank Sciurba, Steven R Duncan, Michael P Donahoe, Alison Morris, Elodie Ghedin
Background: Human microbiome studies in clinical settings generally focus on distinguishing the microbiota in health from that in disease at a specific point in time. However, microbiome samples may be associated with disease severity or continuous clinical health indicators that are often assessed at multiple time points. While the temporal data from clinical and microbiome samples may be informative, analysis of this type of data can be problematic for standard statistical methods. Results: To identify associations between microbiota and continuous clinical variables measured repeatedly in two studies of the respiratory tract, we adapted a statistical method, the lasso-penalized generalized linear mixed model (LassoGLMM)...
2018: BioData Mining
Kimberly T To, Rebecca C Fry, David M Reif
Background: The Toxicological Priority Index (ToxPi) is a method for prioritization and profiling of chemicals that integrates data from diverse sources. However, individual data sources ("assays"), such as in vitro bioassays or in vivo study endpoints, often feature sections of missing data, wherein subsets of chemicals have not been tested in all assays. In order to investigate the effects of missing data and recommend solutions, we designed simulation studies around high-throughput screening data generated by the ToxCast and Tox21 programs on chemicals highlighted by the Agency for Toxic Substances and Disease Registry's (ATSDR) Substance Priority List (SPL), which helps prioritize environmental research and remediation resources...
2018: BioData Mining
Ravi Mathur, Daniel Rotroff, Jun Ma, Ali Shojaie, Alison Motsinger-Reif
Background: Gene set analysis is a valuable tool to summarize high-dimensional gene expression data in terms of biologically relevant sets. This is an active area of research and numerous gene set analysis methods have been developed. Despite this popularity, systematic comparative studies have been limited in scope. Methods: In this study we present a semi-synthetic simulation study using real datasets in order to test and compare commonly used methods. Results: A software pipeline, Flexible Algorithm for Novel Gene set Simulation (FANGS) develops simulated data based on a prostate cancer dataset where the KRAS and TGF-β pathways were differentially expressed...
2018: BioData Mining
Enrico Ferrero, Pankaj Agarwal
Developing new drugs continues to be a highly inefficient and costly business. By repurposing an existing compound for a different indication, drug repositioning offers an attractive alternative to traditional drug discovery. Most of these approaches work by matching transcriptional disease signatures to anti-correlated gene expression profiles of drug perturbations. Genome-wide association studies (GWASs) are of great interest to researchers in the pharmaceutical industry because drug programmes with supporting genetic evidence are more likely to successfully progress through the drug discovery pipeline...
2018: BioData Mining
Elizabeth R Piette, Jason H Moore
Background: Machine learning methods and conventions are increasingly employed for the analysis of large, complex biomedical data sets, including genome-wide association studies (GWAS). Reproducibility of machine learning analyses of GWAS can be hampered by biological and statistical factors, particularly so for the investigation of non-additive genetic interactions. Application of traditional cross validation to a GWAS data set may result in poor consistency between the training and testing data set splits due to an imbalance of the interaction genotypes relative to the data as a whole...
2018: BioData Mining
Shefali S Verma, Anastasia Lucas, Xinyuan Zhang, Yogasudha Veturi, Scott Dudek, Binglan Li, Ruowang Li, Ryan Urbanowicz, Jason H Moore, Dokyoon Kim, Marylyn D Ritchie
Background: Machine learning methods have gained popularity and practicality in identifying linear and non-linear effects of variants associated with complex disease/traits. Detection of epistatic interactions still remains a challenge due to the large number of features and relatively small sample size as input, thus leading to the so-called "short fat data" problem. The efficiency of machine learning methods can be increased by limiting the number of input features. Thus, it is very important to perform variable selection before searching for epistasis...
2018: BioData Mining
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"