Add like
Add dislike
Add to saved papers

Estimating error models for whole genome sequencing using mixtures of Dirichlet-multinomial distributions.

Bioinformatics 2017 August 2
Motivation: Accurate identification of genotypes is an essential part of the analysis of genomic data, including in identification of sequence polymorphisms, linking mutations with disease and determining mutation rates. Biological and technical processes that adversely affect genotyping include copy-number-variation, paralogous sequences, library preparation, sequencing error and reference-mapping biases, among others.

Results: We modeled the read depth for all data as a mixture of Dirichlet-multinomial distributions, resulting in significant improvements over previously used models. In most cases the best model was comprised of two distributions. The major-component distribution is similar to a binomial distribution with low error and low reference bias. The minor-component distribution is overdispersed with higher error and reference bias. We also found that sites fitting the minor component are enriched for copy number variants and low complexity regions, which can produce erroneous genotype calls. By removing sites that do not fit the major component, we can improve the accuracy of genotype calls.

Availability and Implementation: Methods and data files are available at https://github.com/CartwrightLab/WuEtAl2017/ (doi:10.5281/zenodo.256858).

Contact: [email protected].

Supplementary information: Supplementary data is available at Bioinformatics online.

Full text links

We have located links that may give you full text access.
Can't access the paper?
Try logging in through your university/institutional subscription. For a smoother one-click institutional access experience, please use our mobile app.

Related Resources

For the best experience, use the Read mobile app

Mobile app image

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices Toggle icon

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app