Read by QxMD icon Read

machine learning and snp-snp

Robert J MacInnis, Daniel F Schmidt, Enes Makalic, Gianluca Severi, Liesel FitzGerald, Matthias Reumann, Miroslav K Kapuscinski, Adam Kowalczyk, Zeyu Zhou, Benjamin W Goudey, Guoqi Qian, Minh Bui, Daniel J Park, Adam Freeman, Melissa C Southey, Ali Amin Al Olama, Zsofia Kote-Jarai, Rosalind A Eeles, John L Hopper, Graham G Giles
BACKGROUND: We have developed a GWAS analysis method called DEPTH (DEPendency of association on the number of Top Hits) to identify genomic regions potentially associated with disease by considering overlapping groups of contiguous markers (e.g. single nucleotide polymorphisms, SNPs) across the genome. DEPTH is a machine learning algorithm for feature ranking of ultra-high dimensional datasets, built from well-established statistical tools such as bootstrapping, penalised regression and decision trees...
August 18, 2016: Cancer Epidemiology, Biomarkers & Prevention
Qianchuan He, Tianxi Cai, Yang Liu, Ni Zhao, Quaker E Harmon, Lynn M Almli, Elisabeth B Binder, Stephanie M Engel, Kerry J Ressler, Karen N Conneely, Xihong Lin, Michael C Wu
Kernel machine learning methods, such as the SNP-set kernel association test (SKAT), have been widely used to test associations between traits and genetic polymorphisms. In contrast to traditional single-SNP analysis methods, these methods are designed to examine the joint effect of a set of related SNPs (such as a group of SNPs within a gene or a pathway) and are able to identify sets of SNPs that are associated with the trait of interest. However, as with many multi-SNP testing approaches, kernel machine testing can draw conclusion only at the SNP-set level, and does not directly inform on which one(s) of the identified SNP set is actually driving the associations...
August 3, 2016: Genetic Epidemiology
Robersy Sanchez, Sally A Mackenzie
Cytosine DNA methylation (CDM) is a highly abundant, heritable but reversible chemical modification to the genome. Herein, a machine learning approach was applied to analyze the accumulation of epigenetic marks in methylomes of 152 ecotypes and 85 silencing mutants of Arabidopsis thaliana. In an information-thermodynamics framework, two measurements were used: (1) the amount of information gained/lost with the CDM changes I R and (2) the uncertainty of not observing a SNP L C R . We hypothesize that epigenetic marks are chromosomal footprints accounting for different ontogenetic and phylogenetic histories of individual populations...
2016: International Journal of Molecular Sciences
A Dix, S Vlaic, R Guthke, J Linde
In systems biology, researchers aim to understand complex biological systems as a whole, which is often achieved by mathematical modelling and the analyses of high-throughput data. In this review, we give an overview of medical applications of systems biology approaches with special focus on host-pathogen interactions. After introducing general ideas of systems biology, we focus on (1) the detection of putative biomarkers for improved diagnosis and support of therapeutic decisions, (2) network modelling for the identification of regulatory interactions between cellular molecules to reveal putative drug targets and (3) module discovery for the detection of phenotype-specific modules in molecular interaction networks...
July 2016: Clinical Microbiology and Infection
Isaiah Tolo, Jonathan C Thomas, Rebecca S B Fischer, Eric L Brown, Barry M Gray, D Ashley Robinson
Staphylococcus epidermidis is a ubiquitous colonizer of human skin and a common cause of medical device-associated infections. The extent to which the population genetic structure of S. epidermidis distinguishes commensal from pathogenic isolates is unclear. Previously, Bayesian clustering of 437 multilocus sequence types (STs) in the international database revealed a population structure of six genetic clusters (GCs) that may reflect the species' ecology. Here, we first verified the presence of six GCs, including two (GC3 and GC5) with significant admixture, in an updated database of 578 STs...
July 2016: Journal of Clinical Microbiology
M M Judge, J F Kearney, M C McClure, R D Sleator, D P Berry
The objective of this study was to develop, using alternative algorithms, low-density SNP genotyping panels (384 to 12,000 SNP), which can be accurately imputed to higher-density panels across independent cattle populations. Single nucleotide polymorphisms were selected based on genomic characteristics (i.e., linkage disequilibrium [LD], minor allele frequency [MAF], and genomic distance) in a population of 1,267 Holstein-Friesian animals genotyped on the Illumina Bovine50 Beadchip (54,001 SNP). Single nucleotide polymorphism selection methods included 1) random; 2) equidistant location; 3) combination of SNP MAF and LD structure while maintaining relatively equal genomic distance between adjacent SNP; 4) a combination of high MAF, genomic distance between selected and candidate SNP, and correlation between genotypes of selected and candidate SNP; and 5) a machine learning algorithm...
March 2016: Journal of Animal Science
Jing Li, James D Malley, Angeline S Andrew, Margaret R Karagas, Jason H Moore
BACKGROUND: Identifying gene-gene interactions is essential to understand disease susceptibility and to detect genetic architectures underlying complex diseases. Here, we aimed at developing a permutation-based methodology relying on a machine learning method, random forest (RF), to detect gene-gene interactions. Our approach called permuted random forest (pRF) which identified the top interacting single nucleotide polymorphism (SNP) pairs by estimating how much the power of a random forest classification model is influenced by removing pairwise interactions...
2016: BioData Mining
Andrew Dahl, Valentina Iotchkova, Amelie Baud, Åsa Johansson, Ulf Gyllensten, Nicole Soranzo, Richard Mott, Andreas Kranis, Jonathan Marchini
Genetic association studies have yielded a wealth of biological discoveries. However, these studies have mostly analyzed one trait and one SNP at a time, thus failing to capture the underlying complexity of the data sets. Joint genotype-phenotype analyses of complex, high-dimensional data sets represent an important way to move beyond simple genome-wide association studies (GWAS) with great potential. The move to high-dimensional phenotypes will raise many new statistical problems. Here we address the central issue of missing phenotypes in studies with any level of relatedness between samples...
April 2016: Nature Genetics
Joanna Peloquin, Gautam Goel, Hailiang Huang, Talin Haritunians, Ryan Sartor, Mark Daly, Rodney Newberry, Dermot McGovern, Sergio Lira, Ramnik Xavier
BACKGROUND: Genome-wide association studies have linked single nucleotide polymorphisms (SNPs) to risk of inflammatory bowel disease (IBD). Yet, the majority of IBD-associated risk SNPs tag non-coding regions of the genome, with more than 1000 genes encoded within the risk loci. In addition to ongoing fine mapping of risk loci and exome sequencing studies, the study of gene expression and characterization of expression quantitative trait loci (eQTL) help to refine candidate genes in risk loci...
March 2016: Inflammatory Bowel Diseases
Silke Szymczak, Emily Holzinger, Abhijit Dasgupta, James D Malley, Anne M Molloy, James L Mills, Lawrence C Brody, Dwight Stambolian, Joan E Bailey-Wilson
BACKGROUND: Machine learning methods and in particular random forests (RFs) are a promising alternative to standard single SNP analyses in genome-wide association studies (GWAS). RFs provide variable importance measures (VIMs) to rank SNPs according to their predictive power. However, in contrast to the established genome-wide significance threshold, no clear criteria exist to determine how many SNPs should be selected for downstream analyses. RESULTS: We propose a new variable selection approach, recurrent relative variable importance measure (r2VIM)...
2016: BioData Mining
Xiaoyong Pan, Kai Xiong
Recently circular RNA (circularRNA) has been discovered as an increasingly important type of long non-coding RNA (lncRNA), playing an important role in gene regulation, such as functioning as miRNA sponges. So it is very promising to identify circularRNA transcripts from de novo assembled transcripts obtained by high-throughput sequencing, such as RNA-seq data. In this study, we presented a machine learning approach, named as PredcircRNA, focused on distinguishing circularRNA from other lncRNAs using multiple kernel learning...
August 2015: Molecular BioSystems
Fei Lu, Maria C Romay, Jeffrey C Glaubitz, Peter J Bradbury, Robert J Elshire, Tianyu Wang, Yu Li, Yongxiang Li, Kassa Semagn, Xuecai Zhang, Alvaro G Hernandez, Mark A Mikel, Ilya Soifer, Omer Barad, Edward S Buckler
In addition to single-nucleotide polymorphisms, structural variation is abundant in many plant genomes. The structural variation across a species can be represented by a 'pan-genome', which is essential to fully understand the genetic control of phenotypes. However, the pan-genome's complexity hinders its accurate assembly via sequence alignment. Here we demonstrate an approach to facilitate pan-genome construction in maize. By performing 18 trillion association tests we map 26 million tags generated by reduced representation sequencing of 14,129 maize inbred lines...
2015: Nature Communications
Li Li, Yi Xiong, Zhuo-Yu Zhang, Quan Guo, Qin Xu, Hien-Haw Liow, Yong-Hong Zhang, Dong-Qing Wei
Single nucleotide polymorphisms (SNPs) make up the most common form of mutations in human cytochrome P450 enzymes family, and have the potential to bring with different drug responses or specific diseases in individual patients. Here, based on machine learning technology, we aim to explore an effective set of sequence-based features for improving prediction of SNPs by using support vector machine algorithms. The features are derived from the target residues and flanking protein sequences, such as amino acid types, sequences composition, physicochemical properties, position-specific scoring matrix, phylogenetic entropy and the number of possible codons of target residues...
March 2015: Interdisciplinary Sciences, Computational Life Sciences
Alexander Kautzky, Pia Baldinger, Daniel Souery, Stuart Montgomery, Julien Mendlewicz, Joseph Zohar, Alessandro Serretti, Rupert Lanzenberger, Siegfried Kasper
For over a decade, the European Group for the Study of Resistant Depression (GSRD) has examined single nucleotide polymorphisms (SNP) and clinical parameters in regard to treatment outcome. However, an interaction based model combining these factors has not been established yet. Regarding the low effect of individual SNPs, a model investigating the interactive role of SNPs and clinical variables in treatment-resistant depression (TRD) seems auspicious. Thus 225 patients featured in previous work of the GSRD were enrolled in this investigation...
April 2015: European Neuropsychopharmacology: the Journal of the European College of Neuropsychopharmacology
Thanh-Tung Nguyen, Joshua Huang, Qingyao Wu, Thuy Nguyen, Mark Li
BACKGROUND: Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases...
2015: BMC Genomics
Rahul C Deo, Gabriel Musso, Murat Tasan, Paul Tang, Annie Poon, Christiana Yuan, Janine F Felix, Ramachandran S Vasan, Rameen Beroukhim, Teresa De Marco, Pui-Yan Kwok, Calum A MacRae, Frederick P Roth
BACKGROUND: Cardiovascular disease (CVD) is the leading cause of death in the developed world. Human genetic studies, including genome-wide sequencing and SNP-array approaches, promise to reveal disease genes and mechanisms representing new therapeutic targets. In practice, however, identification of the actual genes contributing to disease pathogenesis has lagged behind identification of associated loci, thus limiting the clinical benefits. RESULTS: To aid in localizing causal genes, we develop a machine learning approach, Objective Prioritization for Enhanced Novelty (OPEN), which quantitatively prioritizes gene-disease associations based on a diverse group of genomic features...
2014: Genome Biology
Giulietta Minozzi, Andrea Pedretti, Stefano Biffani, Ezequiel Luis Nicolazzi, Alessandra Stella
BACKGROUND: Genome wide association studies are now widely used in the livestock sector to estimate the association among single nucleotide polymorphisms (SNPs) distributed across the whole genome and one or more trait. As computational power increases, the use of machine learning techniques to analyze large genome wide datasets becomes possible. METHODS: The objective of this study was to identify SNPs associated with the three traits simulated in the 16th MAS-QTL workshop dataset using the Random Forest (RF) approach...
2014: BMC Proceedings
A Ehret, D Hochstuhl, N Krattenmacher, J Tetens, M S Klein, W Gronwald, G Thaller
Subclinical ketosis is one of the most prevalent metabolic disorders in high-producing dairy cows during early lactation. This renders its early detection and prevention important for both economical and animal-welfare reasons. Construction of reliable predictive models is challenging, because traits like ketosis are commonly affected by multiple factors. In this context, machine learning methods offer great advantages because of their universal learning ability and flexibility in integrating various sorts of data...
January 2015: Journal of Dairy Science
Ziming Zhang, Heng Huang, Dinggang Shen
In this paper, we explore the effects of integrating multi-dimensional imaging genomics data for Alzheimer's disease (AD) prediction using machine learning approaches. Precisely, we compare our three recent proposed feature selection methods [i.e., multiple kernel learning (MKL), high-order graph matching based feature selection (HGM-FS), sparse multimodal learning (SMML)] using four widely-used modalities [i.e., magnetic resonance imaging (MRI), positron emission tomography (PET), cerebrospinal fluid (CSF), and genetic modality single-nucleotide polymorphism (SNP)]...
2014: Frontiers in Aging Neuroscience
Magnus Lekman, Ola Hössjer, Peter Andrews, Henrik Källberg, Daniel Uvehag, Dennis Charney, Husseini Manji, John A Rush, Francis J McMahon, Jason H Moore, Ingrid Kockum
BACKGROUND: Genetic contributions to major depressive disorder (MDD) are thought to result from multiple genes interacting with each other. Different procedures have been proposed to detect such interactions. Which approach is best for explaining the risk of developing disease is unclear. This study sought to elucidate the genetic interaction landscape in candidate genes for MDD by conducting a SNP-SNP interaction analysis using an exhaustive search through 3,704 SNP-markers in 1,732 cases and 1,783 controls provided from the GAIN MDD study...
2014: BioData Mining
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"