Read by QxMD icon Read

Epistasis machine learning

Elizabeth R Piette, Jason H Moore
Background: Machine learning methods and conventions are increasingly employed for the analysis of large, complex biomedical data sets, including genome-wide association studies (GWAS). Reproducibility of machine learning analyses of GWAS can be hampered by biological and statistical factors, particularly so for the investigation of non-additive genetic interactions. Application of traditional cross validation to a GWAS data set may result in poor consistency between the training and testing data set splits due to an imbalance of the interaction genotypes relative to the data as a whole...
2018: BioData Mining
Shefali S Verma, Anastasia Lucas, Xinyuan Zhang, Yogasudha Veturi, Scott Dudek, Binglan Li, Ruowang Li, Ryan Urbanowicz, Jason H Moore, Dokyoon Kim, Marylyn D Ritchie
Background: Machine learning methods have gained popularity and practicality in identifying linear and non-linear effects of variants associated with complex disease/traits. Detection of epistatic interactions still remains a challenge due to the large number of features and relatively small sample size as input, thus leading to the so-called "short fat data" problem. The efficiency of machine learning methods can be increased by limiting the number of input features. Thus, it is very important to perform variable selection before searching for epistasis...
2018: BioData Mining
C Robert Cloninger, Igor Zwir
There is fundamental doubt about whether the natural unit of measurement for temperament and personality corresponds to single traits or to multi-trait profiles that describe the functioning of a whole person. Biogenetic researchers of temperament usually assume they need to focus on individual traits that differ between individuals. Recent research indicates that a shift of emphasis to understand processes within the individual is crucial for identifying the natural building blocks of temperament. Evolution and development operate on adaptation of whole organisms or persons, not on individual traits or categories...
April 19, 2018: Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences
Brian S Cole, Molly A Hall, Ryan J Urbanowicz, Diane Gilbert-Diamond, Jason H Moore
The goal of this unit is to introduce epistasis, or gene-gene interactions, as a significant contributor to the genetic architecture of complex traits, including disease susceptibility. This unit begins with an historical overview of the concept of epistasis and the challenges inherent in the identification of potential gene-gene interactions. Then, it reviews statistical and machine learning methods for discovering epistasis in the context of genetic studies of quantitative and categorical traits. This unit concludes with a discussion of meta-analysis, replication, and other topics of active research...
October 18, 2017: Current Protocols in Human Genetics
Réka Howard, Alicia L Carriquiry, William D Beavis
An epistatic genetic architecture can have a significant impact on prediction accuracies of genomic prediction (GP) methods. Machine learning methods predict traits comprised of epistatic genetic architectures more accurately than statistical methods based on additive mixed linear models. The differences between these types of GP methods suggest a diagnostic for revealing genetic architectures underlying traits of interest. In addition to genetic architecture, the performance of GP methods may be influenced by the sample size of the training population, the number of QTL, and the proportion of phenotypic variability due to genotypic variability (heritability)...
September 7, 2017: G3: Genes—Genomes—Genetics
Jing Li, James D Malley, Angeline S Andrew, Margaret R Karagas, Jason H Moore
BACKGROUND: Identifying gene-gene interactions is essential to understand disease susceptibility and to detect genetic architectures underlying complex diseases. Here, we aimed at developing a permutation-based methodology relying on a machine learning method, random forest (RF), to detect gene-gene interactions. Our approach called permuted random forest (pRF) which identified the top interacting single nucleotide polymorphism (SNP) pairs by estimating how much the power of a random forest classification model is influenced by removing pairwise interactions...
2016: BioData Mining
Clément Niel, Christine Sinoquet, Christian Dina, Ghislain Rocheleau
During the past decade, findings of genome-wide association studies (GWAS) improved our knowledge and understanding of disease genetics. To date, thousands of SNPs have been associated with diseases and other complex traits. Statistical analysis typically looks for association between a phenotype and a SNP taken individually via single-locus tests. However, geneticists admit this is an oversimplified approach to tackle the complexity of underlying biological mechanisms. Interaction between SNPs, namely epistasis, must be considered...
2015: Frontiers in Genetics
H A Smith, B J White, P Kundert, C Cheng, J Romero-Severson, P Andolfatto, N J Besansky
Although freshwater (FW) is the ancestral habitat for larval mosquitoes, multiple species independently evolved the ability to survive in saltwater (SW). Here, we use quantitative trait locus (QTL) mapping to investigate the genetic architecture of osmoregulation in Anopheles mosquitoes, vectors of human malaria. We analyzed 1134 backcross progeny from a cross between the obligate FW species An. coluzzii, and its closely related euryhaline sibling species An. merus. Tests of 2387 markers with Bayesian interval mapping and machine learning (random forests) yielded six genomic regions associated with SW tolerance...
November 2015: Heredity
Caleb A Lareau, Bill C White, Ann L Oberg, Brett A McKinney
BACKGROUND: Biological insights into group differences, such as disease status, have been achieved through differential co-expression analysis of microarray data. Additional understanding of group differences may be achieved by integrating the connectivity structure of the differential co-expression network and per-gene differential expression between phenotypic groups. Such a global differential co-expression network strategy may increase sensitivity to detect gene-gene interactions (or expression epistasis) that may act as candidates for rewiring susceptibility co-expression networks...
2015: BioData Mining
Qing Li, Yoonhee Kim, Bhoom Suktitipat, Jacqueline B Hetmanski, Mary L Marazita, Priya Duggal, Terri H Beaty, Joan E Bailey-Wilson
Genome-wide association studies (GWAS) for nonsyndromic cleft lip with or without cleft palate (CL/P) have identified multiple genes as important in the etiology of this common birth defect. We performed a candidate gene/pathway analysis explicitly considering gene-gene (G × G) interaction to further explore the etiology of CL/P. Animal models have shown the WNT signaling pathway plays an important role in mid-facial development, and various genes in this pathway have been associated with nonsyndromic CL/P in previous studies...
July 2015: Genetic Epidemiology
Emily Rose Holzinger, Silke Szymczak, Abhijit Dasgupta, James Malley, Qing Li, Joan E Bailey-Wilson
Standard analysis methods for genome wide association studies (GWAS) are not robust to complex disease models, such as interactions between variables with small main effects. These types of effects likely contribute to the heritability of complex human traits. Machine learning methods that are capable of identifying interactions, such as Random Forests (RF), are an alternative analysis approach. One caveat to RF is that there is no standardized method of selecting variables so that false positives are reduced while retaining adequate power...
2015: Pacific Symposium on Biocomputing
Fabrízzio Condé de Oliveira, Carlos Cristiano Hasenclever Borges, Fernanda Nascimento Almeida, Fabyano Fonseca e Silva, Rui da Silva Verneque, Marcos Vinicius G B da Silva, Wagner Arbex
INTRODUCTION: This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence...
2014: BMC Genomics
Jason H Moore
Here we introduce the ReliefF machine learning algorithm and some of its extensions for detecting and characterizing epistasis in genetic association studies. We provide a general overview of the method and then highlight some of the modifications that have greatly improved its power for genetic analysis. We end with a few examples of published studies of complex human diseases that have used ReliefF.
2015: Methods in Molecular Biology
Derrek P Hibar, Jason L Stein, Neda Jahanshad, Omid Kohannim, Xue Hua, Arthur W Toga, Katie L McMahon, Greig I de Zubicaray, Nicholas G Martin, Margaret J Wright, Michael W Weiner, Paul M Thompson
The discovery of several genes that affect the risk for Alzheimer's disease ignited a worldwide search for single-nucleotide polymorphisms (SNPs), common genetic variants that affect the brain. Genome-wide search of all possible SNP-SNP interactions is challenging and rarely attempted because of the complexity of conducting approximately 10(11) pairwise statistical tests. However, recent advances in machine learning, for example, iterative sure independence screening, make it possible to analyze data sets with vastly more predictors than observations...
January 2015: Neurobiology of Aging
Xia Jiang, Binghuang Cai, Diyang Xue, Xinghua Lu, Gregory F Cooper, Richard E Neapolitan
OBJECTIVE: The objective of this investigation is to evaluate binary prediction methods for predicting disease status using high-dimensional genomic data. The central hypothesis is that the Bayesian network (BN)-based method called efficient Bayesian multivariate classifier (EBMC) will do well at this task because EBMC builds on BN-based methods that have performed well at learning epistatic interactions. METHOD: We evaluate how well eight methods perform binary prediction using high-dimensional discrete genomic datasets containing epistatic interactions...
October 2014: Journal of the American Medical Informatics Association: JAMIA
Derrek P Hibar, Jason L Stein, Neda Jahanshad, Omid Kohannim, Arthur W Toga, Katie L McMahon, Greig I de Zubicaray, Grant W Montgomery, Nicholas G Martin, Margaret J Wright, Michael W Weiner, Paul M Thompson
The SNP-SNP interactome has rarely been explored in the context of neuroimaging genetics mainly due to the complexity of conducting approximately 10(11) pairwise statistical tests. However, recent advances in machine learning, specifically the iterative sure independence screening (SIS) method, have enabled the analysis of datasets where the number of predictors is much larger than the number of observations. Using an implementation of the SIS algorithm (called EPISIS), we used exhaustive search of the genome-wide, SNP-SNP interactome to identify and prioritize SNPs for interaction analysis...
2013: Medical Image Computing and Computer-assisted Intervention: MICCAI ..
Brett A McKinney, Bill C White, Diane E Grill, Peter W Li, Richard B Kennedy, Gregory A Poland, Ann L Oberg
Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data...
2013: PloS One
Ching Lee Koo, Mei Jing Liew, Mohd Saberi Mohamad, Abdul Hakim Mohamed Salleh
Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs), support vector machine (SVM), and random forests (RFs) in such common and multifactorial disease...
2013: BioMed Research International
C Yao, D M Spurlock, L E Armentano, C D Page, M J VandeHaar, D M Bickhart, K A Weigel
Feed efficiency is an economically important trait in the beef and dairy cattle industries. Residual feed intake (RFI) is a measure of partial efficiency that is independent of production level per unit of body weight. The objective of this study was to identify significant associations between single nucleotide polymorphism (SNP) markers and RFI in dairy cattle using the Random Forests (RF) algorithm. Genomic data included 42,275 SNP genotypes for 395 Holstein cows, whereas phenotypic measurements were daily RFI from 50 to 150 d postpartum...
October 2013: Journal of Dairy Science
Li Xie, Clara Ng, Thahmina Ali, Raoul Valencia, Barbara L Ferreira, Vincent Xue, Maliha Tanweer, Dan Zhou, Gabriel G Haddad, Philip E Bourne, Lei Xie
BACKGROUND: It is a great challenge of modern biology to determine the functional roles of non-synonymous Single Nucleotide Polymorphisms (nsSNPs) on complex phenotypes. Statistical and machine learning techniques establish correlations between genotype and phenotype, but may fail to infer the biologically relevant mechanisms. The emerging paradigm of Network-based Association Studies aims to address this problem of statistical analysis. However, a mechanistic understanding of how individual molecular components work together in a system requires knowledge of molecular structures, and their interactions...
2013: BMC Genomics
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"