FIFS: A data mining method for informative marker selection in high dimensional population genomic data.

Ioannis Kavakiotis, Patroklos Samaras, Alexandros Triantafyllidis, Ioannis Vlahavas

Computers in Biology and Medicine 2017 November 2

BACKGROUND AND OBJECTIVE: Single Nucleotide Polymorphism (SNPs) are, nowadays, becoming the marker of choice for biological analyses involving a wide range of applications with great medical, biological, economic and environmental interest. Classification tasks i.e. the assignment of individuals to groups of origin based on their (multi-locus) genotypes, are performed in many fields such as forensic investigations, discrimination between wild and/or farmed populations and others. Τhese tasks, should be performed with a small number of loci, for computational as well as biological reasons. Thus, feature selection should precede classification tasks, especially for Single Nucleotide Polymorphism (SNP) datasets, where the number of features can amount to hundreds of thousands or millions.

METHODS: In this paper, we present a novel data mining approach, called FIFS - Frequent Item Feature Selection, based on the use of frequent items for selection of the most informative markers from population genomic data. It is a modular method, consisting of two main components. The first one identifies the most frequent and unique genotypes for each sampled population. The second one selects the most appropriate among them, in order to create the informative SNP subsets to be returned.

RESULTS: The proposed method (FIFS) was tested on a real dataset, which comprised of a comprehensive coverage of pig breed types present in Britain. This dataset consisted of 446 individuals divided in 14 sub-populations, genotyped at 59,436 SNPs. Our method outperforms the state-of-the-art and baseline methods in every case. More specifically, our method surpassed the assignment accuracy threshold of 95% needing only half the number of SNPs selected by other methods (FIFS: 28 SNPs, Delta: 70 SNPs Pairwise FST: 70 SNPs, In: 100 SNPs.) CONCLUSION: Our approach successfully deals with the problem of informative marker selection in high dimensional genomic datasets. It offers better results compared to existing approaches and can aid biologists in selecting the most informative markers with maximum discrimination power for optimization of cost-effective panels with applications related to e.g. species identification, wildlife management, and forensics.

Full text links

We have located links that may give you full text access.

Show additional links to paperHide additional links to paper

PubMed

Add to Saved Papers

Get 1-tap access

Related Resources

Heart failure with preserved ejection fraction: diagnosis, risk assessment, and treatment.Stephan von Haehling et al.Clinical Research in Cardiology : Official Journal of the German Cardiac Society 2024 April 12

Management of cardiogenic shock: a narrative review.Driss Laghlam et al.Annals of Intensive Care 2024 March 31

Proximal versus distal diuretics in congestive heart failure.Massimo Nardone et al.Nephrology, Dialysis, Transplantation 2024 Februrary 30

Efficacy and safety of pharmacotherapy in chronic insomnia: A review of clinical guidelines and case reports.Alejandro Del Rio Verduzco et al.Mental Health Clinician 2023 October

World Health Organization and International Consensus Classification of eosinophilic disorders: 2024 update on diagnosis, risk stratification, and management.William Shomali, Jason GotlibAmerican Journal of Hematology 2024 March 30

Managing Alcohol Withdrawal Syndrome.Michael Gottlieb, Nicholas Chien, Brit LongAnnals of Emergency Medicine 2024 March 26

Anti-Arrhythmic Effects of Heart Failure Guideline-Directed Medical Therapy and Their Role in the Prevention of Sudden Cardiac Death: From Beta-Blockers to Sodium-Glucose Cotransporter 2 Inhibitors and Beyond.Wael Zaher et al.Journal of Clinical Medicine 2024 Februrary 27

Effectiveness and safety of drugs for obesity.Kristina Henderson et al.BMJ : British Medical Journal 2024 March 26

Perioperative echocardiographic strain analysis: what anesthesiologists should know.Adrian Costescu et al.Canadian Journal of Anaesthesia 2024 April 11

For the best experience, use the Read mobile app

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

FIFS: A data mining method for informative marker selection in high dimensional population genomic data.

Full text links

Related Resources

Trending Papers

For the best experience, use the Read mobile app