Journal Article
Research Support, N.I.H., Extramural
Add like
Add dislike
Add to saved papers

Calculating the statistical significance of rare variants causal for Mendelian and complex disorders.

BACKGROUND: With the expanding use of next-gen sequencing (NGS) to diagnose the thousands of rare Mendelian genetic diseases, it is critical to be able to interpret individual DNA variation. To calculate the significance of finding a rare protein-altering variant in a given gene, one must know the frequency of seeing a variant in the general population that is at least as damaging as the variant in question.

METHODS: We developed a general method to better interpret the likelihood that a rare variant is disease causing if observed in a given gene or genic region mapping to a described protein domain, using genome-wide information from a large control sample. Based on data from 2504 individuals in the 1000 Genomes Project dataset, we calculated the number of individuals who have a rare variant in a given gene for numerous filtering threshold scenarios, which may be used for calculating the significance of an observed rare variant being causal for disease. Additionally, we calculated mutational burden data on the number of individuals with rare variants in genic regions mapping to protein domains.

RESULTS: We describe methods to use the mutational burden data for calculating the significance of observing rare variants in a given proportion of sequenced individuals. We present SORVA, an implementation of these methods as a web tool, and we demonstrate application to 20 relevant but diverse next-gen sequencing studies. Specifically, we calculate the statistical significance of findings involving multi-family studies with rare Mendelian disease and a large-scale study of a complex disorder, autism spectrum disorder. If we use the frequency counts to rank genes based on intolerance for variation, the ranking correlates well with pLI scores derived from the Exome Aggregation Consortium (ExAC) dataset (ρ = 0.515), with the benefit that the scores are directly interpretable.

CONCLUSIONS: We have presented a strategy that is useful for vetting candidate genes from NGS studies and allows researchers to calculate the significance of seeing a variant in a given gene or protein domain. This approach is an important step towards developing a quantitative, statistics-based approach for presenting clinical findings.

Full text links

We have located links that may give you full text access.
Can't access the paper?
Try logging in through your university/institutional subscription. For a smoother one-click institutional access experience, please use our mobile app.

Related Resources

For the best experience, use the Read mobile app

Mobile app image

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices Toggle icon

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app