Read by QxMD icon Read

Epistasis machine learning

Réka Howard, Alicia L Carriquiry, William D Beavis
An epistatic genetic architecture can have a significant impact on prediction accuracies of genomic prediction (GP) methods. Machine learning methods predict traits comprised of epistatic genetic architectures more accurately than statistical methods based on additive mixed linear models. The differences between these types of GP methods suggest a diagnostic for revealing genetic architectures underlying traits of interest. In addition to genetic architecture, the performance of GP methods may be influenced by the sample size of the training population, the number of QTL, and the proportion of phenotypic variability due to genotypic variability (heritability)...
September 7, 2017: G3: Genes—Genomes—Genetics
Jing Li, James D Malley, Angeline S Andrew, Margaret R Karagas, Jason H Moore
BACKGROUND: Identifying gene-gene interactions is essential to understand disease susceptibility and to detect genetic architectures underlying complex diseases. Here, we aimed at developing a permutation-based methodology relying on a machine learning method, random forest (RF), to detect gene-gene interactions. Our approach called permuted random forest (pRF) which identified the top interacting single nucleotide polymorphism (SNP) pairs by estimating how much the power of a random forest classification model is influenced by removing pairwise interactions...
2016: BioData Mining
Clément Niel, Christine Sinoquet, Christian Dina, Ghislain Rocheleau
During the past decade, findings of genome-wide association studies (GWAS) improved our knowledge and understanding of disease genetics. To date, thousands of SNPs have been associated with diseases and other complex traits. Statistical analysis typically looks for association between a phenotype and a SNP taken individually via single-locus tests. However, geneticists admit this is an oversimplified approach to tackle the complexity of underlying biological mechanisms. Interaction between SNPs, namely epistasis, must be considered...
2015: Frontiers in Genetics
H A Smith, B J White, P Kundert, C Cheng, J Romero-Severson, P Andolfatto, N J Besansky
Although freshwater (FW) is the ancestral habitat for larval mosquitoes, multiple species independently evolved the ability to survive in saltwater (SW). Here, we use quantitative trait locus (QTL) mapping to investigate the genetic architecture of osmoregulation in Anopheles mosquitoes, vectors of human malaria. We analyzed 1134 backcross progeny from a cross between the obligate FW species An. coluzzii, and its closely related euryhaline sibling species An. merus. Tests of 2387 markers with Bayesian interval mapping and machine learning (random forests) yielded six genomic regions associated with SW tolerance...
November 2015: Heredity
Caleb A Lareau, Bill C White, Ann L Oberg, Brett A McKinney
BACKGROUND: Biological insights into group differences, such as disease status, have been achieved through differential co-expression analysis of microarray data. Additional understanding of group differences may be achieved by integrating the connectivity structure of the differential co-expression network and per-gene differential expression between phenotypic groups. Such a global differential co-expression network strategy may increase sensitivity to detect gene-gene interactions (or expression epistasis) that may act as candidates for rewiring susceptibility co-expression networks...
2015: BioData Mining
Qing Li, Yoonhee Kim, Bhoom Suktitipat, Jacqueline B Hetmanski, Mary L Marazita, Priya Duggal, Terri H Beaty, Joan E Bailey-Wilson
Genome-wide association studies (GWAS) for nonsyndromic cleft lip with or without cleft palate (CL/P) have identified multiple genes as important in the etiology of this common birth defect. We performed a candidate gene/pathway analysis explicitly considering gene-gene (G × G) interaction to further explore the etiology of CL/P. Animal models have shown the WNT signaling pathway plays an important role in mid-facial development, and various genes in this pathway have been associated with nonsyndromic CL/P in previous studies...
July 2015: Genetic Epidemiology
Emily Rose Holzinger, Silke Szymczak, Abhijit Dasgupta, James Malley, Qing Li, Joan E Bailey-Wilson
Standard analysis methods for genome wide association studies (GWAS) are not robust to complex disease models, such as interactions between variables with small main effects. These types of effects likely contribute to the heritability of complex human traits. Machine learning methods that are capable of identifying interactions, such as Random Forests (RF), are an alternative analysis approach. One caveat to RF is that there is no standardized method of selecting variables so that false positives are reduced while retaining adequate power...
2015: Pacific Symposium on Biocomputing
Fabrízzio Condé de Oliveira, Carlos Cristiano Hasenclever Borges, Fernanda Nascimento Almeida, Fabyano Fonseca e Silva, Rui da Silva Verneque, Marcos Vinicius G B da Silva, Wagner Arbex
INTRODUCTION: This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence...
2014: BMC Genomics
Jason H Moore
Here we introduce the ReliefF machine learning algorithm and some of its extensions for detecting and characterizing epistasis in genetic association studies. We provide a general overview of the method and then highlight some of the modifications that have greatly improved its power for genetic analysis. We end with a few examples of published studies of complex human diseases that have used ReliefF.
2015: Methods in Molecular Biology
Derrek P Hibar, Jason L Stein, Neda Jahanshad, Omid Kohannim, Xue Hua, Arthur W Toga, Katie L McMahon, Greig I de Zubicaray, Nicholas G Martin, Margaret J Wright, Michael W Weiner, Paul M Thompson
The discovery of several genes that affect the risk for Alzheimer's disease ignited a worldwide search for single-nucleotide polymorphisms (SNPs), common genetic variants that affect the brain. Genome-wide search of all possible SNP-SNP interactions is challenging and rarely attempted because of the complexity of conducting approximately 10(11) pairwise statistical tests. However, recent advances in machine learning, for example, iterative sure independence screening, make it possible to analyze data sets with vastly more predictors than observations...
January 2015: Neurobiology of Aging
Xia Jiang, Binghuang Cai, Diyang Xue, Xinghua Lu, Gregory F Cooper, Richard E Neapolitan
OBJECTIVE: The objective of this investigation is to evaluate binary prediction methods for predicting disease status using high-dimensional genomic data. The central hypothesis is that the Bayesian network (BN)-based method called efficient Bayesian multivariate classifier (EBMC) will do well at this task because EBMC builds on BN-based methods that have performed well at learning epistatic interactions. METHOD: We evaluate how well eight methods perform binary prediction using high-dimensional discrete genomic datasets containing epistatic interactions...
October 2014: Journal of the American Medical Informatics Association: JAMIA
Derrek P Hibar, Jason L Stein, Neda Jahanshad, Omid Kohannim, Arthur W Toga, Katie L McMahon, Greig I de Zubicaray, Grant W Montgomery, Nicholas G Martin, Margaret J Wright, Michael W Weiner, Paul M Thompson
The SNP-SNP interactome has rarely been explored in the context of neuroimaging genetics mainly due to the complexity of conducting approximately 10(11) pairwise statistical tests. However, recent advances in machine learning, specifically the iterative sure independence screening (SIS) method, have enabled the analysis of datasets where the number of predictors is much larger than the number of observations. Using an implementation of the SIS algorithm (called EPISIS), we used exhaustive search of the genome-wide, SNP-SNP interactome to identify and prioritize SNPs for interaction analysis...
2013: Medical Image Computing and Computer-assisted Intervention: MICCAI ..
Brett A McKinney, Bill C White, Diane E Grill, Peter W Li, Richard B Kennedy, Gregory A Poland, Ann L Oberg
Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data...
2013: PloS One
Ching Lee Koo, Mei Jing Liew, Mohd Saberi Mohamad, Abdul Hakim Mohamed Salleh
Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs), support vector machine (SVM), and random forests (RFs) in such common and multifactorial disease...
2013: BioMed Research International
C Yao, D M Spurlock, L E Armentano, C D Page, M J VandeHaar, D M Bickhart, K A Weigel
Feed efficiency is an economically important trait in the beef and dairy cattle industries. Residual feed intake (RFI) is a measure of partial efficiency that is independent of production level per unit of body weight. The objective of this study was to identify significant associations between single nucleotide polymorphism (SNP) markers and RFI in dairy cattle using the Random Forests (RF) algorithm. Genomic data included 42,275 SNP genotypes for 395 Holstein cows, whereas phenotypic measurements were daily RFI from 50 to 150 d postpartum...
October 2013: Journal of Dairy Science
Li Xie, Clara Ng, Thahmina Ali, Raoul Valencia, Barbara L Ferreira, Vincent Xue, Maliha Tanweer, Dan Zhou, Gabriel G Haddad, Philip E Bourne, Lei Xie
BACKGROUND: It is a great challenge of modern biology to determine the functional roles of non-synonymous Single Nucleotide Polymorphisms (nsSNPs) on complex phenotypes. Statistical and machine learning techniques establish correlations between genotype and phenotype, but may fail to infer the biologically relevant mechanisms. The emerging paradigm of Network-based Association Studies aims to address this problem of statistical analysis. However, a mechanistic understanding of how individual molecular components work together in a system requires knowledge of molecular structures, and their interactions...
2013: BMC Genomics
Ronald M Nelson, Marcin Kierczak, Orjan Carlborg
Higher order interactions are known to affect many different phenotypic traits. The advent of large-scale genotyping has, however, shown that finding interactions is not a trivial task. Classical genome-wide association studies (GWAS) are a useful starting point for unraveling the genetic architecture of a phenotypic trait. However, to move beyond the additive model we need new analysis tools specifically developed to deal with high-dimensional genotypic data. Here we show that evolutionary algorithms are a useful tool in high-dimensional analyses designed to identify gene-gene interactions in current large-scale genotypic data...
2013: Methods in Molecular Biology
Xiang Zhang, Shunping Huang, Zhaojun Zhang, Wei Wang
Genome-wide association study (GWAS) aims to discover genetic factors underlying phenotypic traits. The large number of genetic factors poses both computational and statistical challenges. Various computational approaches have been developed for large scale GWAS. In this chapter, we will discuss several widely used computational approaches in GWAS. The following topics will be covered: (1) An introduction to the background of GWAS. (2) The existing computational approaches that are widely used in GWAS. This will cover single-locus, epistasis detection, and machine learning methods that have been recently developed in biology, statistic, and computer science communities...
2012: PLoS Computational Biology
Bing Han, Xue-wen Chen, Zohreh Talebizadeh, Hua Xu
BACKGROUND: Detecting epistatic interactions plays a significant role in improving pathogenesis, prevention, diagnosis, and treatment of complex human diseases. Applying machine learning or statistical methods to epistatic interaction detection will encounter some common problems, e.g., very limited number of samples, an extremely high search space, a large number of false positives, and ways to measure the association between disease markers and the phenotype. RESULTS: To address the problems of computational methods in epistatic interaction detection, we propose a score-based Bayesian network structure learning method, EpiBN, to detect epistatic interactions...
2012: BMC Systems Biology
A Pandey, N A Davis, B C White, N M Pajewski, J Savitz, W C Drevets, B A McKinney
Most pathway and gene-set enrichment methods prioritize genes by their main effect and do not account for variation due to interactions in the pathway. A portion of the presumed missing heritability in genome-wide association studies (GWAS) may be accounted for through gene-gene interactions and additive genetic variability. In this study, we prioritize genes for pathway enrichment in GWAS of bipolar disorder (BD) by aggregating gene-gene interaction information with main effect associations through a machine learning (evaporative cooling) feature selection and epistasis network centrality analysis...
2012: Translational Psychiatry
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"