Read by QxMD icon Read

BioData Mining

Wenyi Qin, Hui Lu
Motivation: Detecting differentially expressed (DE) genes between disease and normal control group is one of the most common analyses in genome-wide transcriptomic data. Since most studies don't have a lot of samples, researchers have used meta-analysis to group different datasets for the same disease. Even then, in many cases the statistical power is still not enough. Taking into account the fact that many diseases share the same disease genes, it is desirable to design a statistical framework that can identify diseases' common and specific DE genes simultaneously to improve the identification power...
2018: BioData Mining
Moshe Sipper, Weixuan Fu, Karuna Ahuja, Jason H Moore
Evolutionary computation (EC) has been widely applied to biological and biomedical data. The practice of EC involves the tuning of many parameters, such as population size, generation count, selection size, and crossover and mutation rates. Through an extensive series of experiments over multiple evolutionary algorithm implementations and 25 problems we show that parameter space tends to be rife with viable parameters, at least for the problems studied herein. We discuss the implications of this finding in practice for the researcher employing EC...
2018: BioData Mining
Eunice Carrasquinha, André Veríssimo, Marta B Lopes, Susana Vinga
Background: Survival analysis is a statistical technique widely used in many fields of science, in particular in the medical area, and which studies the time until an event of interest occurs. Outlier detection in this context has gained great importance due to the fact that the identification of long or short-term survivors may lead to the detection of new prognostic factors. However, the results obtained using different outlier detection methods and residuals are seldom the same and are strongly dependent of the specific Cox proportional hazards model selected...
2018: BioData Mining
Andrej Čopar, Marinka Žitnik, Blaž Zupan
Background: Matrix factorization is a well established pattern discovery tool that has seen numerous applications in biomedical data analytics, such as gene expression co-clustering, patient stratification, and gene-disease association mining. Matrix factorization learns a latent data model that takes a data matrix and transforms it into a latent feature space enabling generalization, noise removal and feature discovery. However, factorization algorithms are numerically intensive, and hence there is a pressing challenge to scale current algorithms to work with large datasets...
2017: BioData Mining
Qiwei Xie, Xi Chen, Hao Deng, Danqian Liu, Yingyu Sun, Xiaojuan Zhou, Yang Yang, Hua Han
Background: In the nervous system, the neurons communicate through synapses. The size, morphology, and connectivity of these synapses are significant in determining the functional properties of the neural network. Therefore, they have always been a major focus of neuroscience research. Two-photon laser scanning microscopy allows the visualization of synaptic structures in vivo, leading to many important findings. However, the identification and quantification of structural imaging data currently rely heavily on manual annotation, a method that is both time-consuming and prone to bias...
2017: BioData Mining
Zhenqiu Liu, Fengzhu Sun, Dermot P McGovern
Background: Feature selection and prediction are the most important tasks for big data mining. The common strategies for feature selection in big data mining are L1, SCAD and MC+. However, none of the existing algorithms optimizes L0, which penalizes the number of nonzero features directly. Results: In this paper, we develop a novel sparse generalized linear model (GLM) with L0 approximation for feature selection and prediction with big omics data. The proposed approach approximate the L0 optimization directly...
2017: BioData Mining
J Grey Monroe, Zachariah A Allen, Paul Tanger, Jack L Mullen, John T Lovell, Brook T Moyers, Darrell Whitley, John K McKay
Background: Recent advances in nucleic acid sequencing technologies have led to a dramatic increase in the number of markers available to generate genetic linkage maps. This increased marker density can be used to improve genome assemblies as well as add much needed resolution for loci controlling variation in ecologically and agriculturally important traits. However, traditional genetic map construction methods from these large marker datasets can be computationally prohibitive and highly error prone...
2017: BioData Mining
Luluah Alhusain, Alaaeldin M Hafez
Background: Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Advances in DNA sequencing technology have facilitated the obtainment of genetic datasets with exceptional sizes...
2017: BioData Mining
Randal S Olson, William La Cava, Patryk Orzechowski, Ryan J Urbanowicz, Jason H Moore
Background: The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists...
2017: BioData Mining
Davide Chicco
Machine learning has become a pivotal tool for many projects in computational biology, bioinformatics, and health informatics. Nevertheless, beginners and biomedical researchers often do not have enough experience to run a data mining project effectively, and therefore can follow incorrect practices, that may lead to common mistakes or over-optimistic results. With this review, we present ten quick tips to take advantage of machine learning in any computational biology context, by avoiding some common errors that we observed hundreds of times in multiple bioinformatics projects...
2017: BioData Mining
Moshe Sipper, Jason H Moore
No abstract text is available yet for this article.
2017: BioData Mining
Indrani Ray, Anindya Bhattacharya, Rajat K De
Background: Obesity is a medical condition that is known for increased body mass index (BMI). It is also associated with chronic low level inflammation. Obesity disrupts the immune-metabolic homeostasis by changing the secretion of adipocytes. This affects the end-organs, and gives rise to several diseases including type 2 diabetes, asthma, non-alcoholic fatty liver diseases and cancers. These diseases are known as co-morbid diseases. Several studies have explored the underlying molecular mechanisms of developing obesity associated comorbid diseases...
2017: BioData Mining
Elpidio-Emmanuel Gonzalez-Valbuena, Víctor Treviño
Background: Detecting the differences in gene expression data is important for understanding the underlying molecular mechanisms. Although the differentially expressed genes are a large component, differences in correlation are becoming an interesting approach to achieving deeper insights. However, diverse metrics have been used to detect differential correlation, making selection and use of a single metric difficult. In addition, available implementations are metric-specific, complicating their use in different contexts...
2017: BioData Mining
Spiros Denaxas, Kenan Direk, Arturo Gonzalez-Izquierdo, Maria Pikoula, Aylin Cakiroglu, Jason Moore, Harry Hemingway, Liam Smeeth
BACKGROUND: The ability of external investigators to reproduce published scientific findings is critical for the evaluation and validation of biomedical research by the wider community. However, a substantial proportion of health research using electronic health records (EHR), data collected and generated during clinical care, is potentially not reproducible mainly due to the fact that the implementation details of most data preprocessing, cleaning, phenotyping and analysis approaches are not systematically made available or shared...
2017: BioData Mining
Bork A Berghoff, Torgny Karlsson, Thomas Källman, E Gerhart H Wagner, Manfred G Grabherr
BACKGROUND: Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes...
2017: BioData Mining
Mina Moradi Kordmahalleh, Mohammad Gorji Sefidmazgi, Scott H Harrison, Abdollah Homaifar
BACKGROUND: The modeling of genetic interactions within a cell is crucial for a basic understanding of physiology and for applied areas such as drug design. Interactions in gene regulatory networks (GRNs) include effects of transcription factors, repressors, small metabolites, and microRNA species. In addition, the effects of regulatory interactions are not always simultaneous, but can occur after a finite time delay, or as a combined outcome of simultaneous and time delayed interactions...
2017: BioData Mining
W B Langdon, Brian Yee Hong Lam
BACKGROUND: BarraCUDA is an open source C program which uses the BWA algorithm in parallel with nVidia CUDA to align short next generation DNA sequences against a reference genome. Recently its source code was optimised using "Genetic Improvement". RESULTS: The genetically improved (GI) code is up to three times faster on short paired end reads from The 1000 Genomes Project and 60% more accurate on a short GCAT alignment benchmark. GPGPU BarraCUDA running on a single K80 Tesla GPU can align short paired end nextGen sequences up to ten times faster than bwa on a 12 core server...
2017: BioData Mining
Antonino Fiannaca, Massimo La Rosa, Laura La Paglia, Riccardo Rizzo, Alfonso Urso
MOTIVATION: Non-coding RNA (ncRNA) are small non-coding sequences involved in gene expression regulation of many biological processes and diseases. The recent discovery of a large set of different ncRNAs with biologically relevant roles has opened the way to develop methods able to discriminate between the different ncRNA classes. Moreover, the lack of knowledge about the complete mechanisms in regulative processes, together with the development of high-throughput technologies, has required the help of bioinformatics tools in addressing biologists and clinicians with a deeper comprehension of the functional roles of ncRNAs...
2017: BioData Mining
Emily R Holzinger, Shefali S Verma, Carrie B Moore, Molly Hall, Rishika De, Diane Gilbert-Diamond, Matthew B Lanktree, Nathan Pankratz, Antoinette Amuzu, Amber Burt, Caroline Dale, Scott Dudek, Clement E Furlong, Tom R Gaunt, Daniel Seung Kim, Helene Riess, Suthesh Sivapalaratnam, Vinicius Tragante, Erik P A van Iperen, Ariel Brautbar, David S Carrell, David R Crosslin, Gail P Jarvik, Helena Kuivaniemi, Iftikhar J Kullo, Eric B Larson, Laura J Rasmussen-Torvik, Gerard Tromp, Jens Baumert, Karen J Cruickshanks, Martin Farrall, Aroon D Hingorani, G K Hovingh, Marcus E Kleber, Barbara E Klein, Ronald Klein, Wolfgang Koenig, Leslie A Lange, Winfried Mӓrz, Kari E North, N Charlotte Onland-Moret, Alex P Reiner, Philippa J Talmud, Yvonne T van der Schouw, James G Wilson, Mika Kivimaki, Meena Kumari, Jason H Moore, Fotios Drenos, Folkert W Asselbergs, Brendan J Keating, Marylyn D Ritchie
BACKGROUND: The genetic etiology of human lipid quantitative traits is not fully elucidated, and interactions between variants may play a role. We performed a gene-centric interaction study for four different lipid traits: low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), total cholesterol (TC), and triglycerides (TG). RESULTS: Our analysis consisted of a discovery phase using a merged dataset of five different cohorts (n = 12,853 to n = 16,849 depending on lipid phenotype) and a replication phase with ten independent cohorts totaling up to 36,938 additional samples...
2017: BioData Mining
Nelson Perdigão, Agostinho C Rosa, Seán I O'Donoghue
No abstract text is available yet for this article.
2017: BioData Mining
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"