Locality-sensitive bucketing functions for the edit distance.

Ke Chen, Mingfu Shao

Algorithms for Molecular Biology : AMB 2023 July 25

BACKGROUND: Many bioinformatics applications involve bucketing a set of sequences where each sequence is allowed to be assigned into multiple buckets. To achieve both high sensitivity and precision, bucketing methods are desired to assign similar sequences into the same bucket while assigning dissimilar sequences into distinct buckets. Existing k-mer-based bucketing methods have been efficient in processing sequencing data with low error rates, but encounter much reduced sensitivity on data with high error rates. Locality-sensitive hashing (LSH) schemes are able to mitigate this issue through tolerating the edits in similar sequences, but state-of-the-art methods still have large gaps.

RESULTS: In this paper, we generalize the LSH function by allowing it to hash one sequence into multiple buckets. Formally, a bucketing function, which maps a sequence (of fixed length) into a subset of buckets, is defined to be [Formula: see text]-sensitive if any two sequences within an edit distance of [Formula: see text] are mapped into at least one shared bucket, and any two sequences with distance at least [Formula: see text] are mapped into disjoint subsets of buckets. We construct locality-sensitive bucketing (LSB) functions with a variety of values of [Formula: see text] and analyze their efficiency with respect to the total number of buckets needed as well as the number of buckets that a specific sequence is mapped to. We also prove lower bounds of these two parameters in different settings and show that some of our constructed LSB functions are optimal.

CONCLUSION: These results lay the theoretical foundations for their practical use in analyzing sequences with high error rates while also providing insights for the hardness of designing ungapped LSH functions.

Full text links

We have located links that may give you full text access.

Show additional links to paperHide additional links to paper

PubMed

Add to Saved Papers

Get 1-tap access

Related Resources

Renin-Angiotensin-Aldosterone System: From History to Practice of a Secular Topic.Sara H Ksiazek et al.International Journal of Molecular Sciences 2024 April 5

Prevention and treatment of ischaemic and haemorrhagic stroke in people with diabetes mellitus: a focus on glucose control and comorbidities.Simona Sacco et al.Diabetologia 2024 April 17

British Society for Rheumatology guideline on management of adult and juvenile onset Sjögren disease.Elizabeth J Price et al.Rheumatology 2024 April 17

For the best experience, use the Read mobile app

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

Locality-sensitive bucketing functions for the edit distance.

Full text links

Related Resources

Trending Papers

For the best experience, use the Read mobile app