Efficient gene orthology inference via large-scale rearrangements.

Diego P Rubert, Marília D V Braga

Algorithms for Molecular Biology : AMB 2023 September 29

BACKGROUND: Recently we developed a gene orthology inference tool based on genome rearrangements (Journal of Bioinformatics and Computational Biology 19:6, 2021). Given a set of genomes our method first computes all pairwise gene similarities. Then it runs pairwise ILP comparisons to compute optimal gene matchings, which minimize, by taking the similarities into account, the weighted rearrangement distance between the analyzed genomes (a problem that is NP-hard). The gene matchings are then integrated into gene families in the final step. The mentioned ILP includes an optimal capping that connects each end of a linear segment of one genome to an end of a linear segment in the other genome, producing an exponential increase of the search space.

RESULTS: In this work, we design and implement a heuristic capping algorithm that replaces the optimal capping by clustering (based on their gene content intersections) the linear segments into [Formula: see text] subsets, whose ends are capped independently. Furthermore, in each subset, instead of allowing all possible connections, we let only the ends of content-related segments be connected. Although there is no guarantee that m is much bigger than one, and with the possible side effect of resulting in sub-optimal instead of optimal gene matchings, the heuristic works very well in practice, from both the speed performance and the quality of computed solutions. Our experiments on primate and fruit fly genomes show two positive results. First, for complete assemblies of five primates the version with heuristic capping reports orthologies that are very similar to the orthologies computed by the version of our tool with optimal capping. Second, we were able to efficiently analyze fruit fly genomes with incomplete assemblies distributed in hundreds or even thousands of contigs, obtaining gene families that are very similar to [Formula: see text] families. Indeed, our tool inferred a higher number of complete cliques, with a higher intersection with [Formula: see text], when compared to gene families computed by other inference tools. We added a post-processing for refining, with the aid of the [Formula: see text] algorithm, our ambiguous families (those with more than one gene per genome), improving even more the accuracy of our results. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities and the post-processing refinement of ambiguous families with [Formula: see text]. Both the original version with optimal capping and the new modified version with heuristic capping can be downloaded, together with their detailed documentations, at https://gitlab.ub.uni-bielefeld.de/gi/FFGC or as a Conda package at https://anaconda.org/bioconda/ffgc .

Full text links

We have located links that may give you full text access.

Show additional links to paperHide additional links to paper

PubMed

Add to Saved Papers

Get 1-tap access

Related Resources

Revascularization Strategy in Myocardial Infarction with Multivessel Disease.Alexander Jobs et al.Journal of Clinical Medicine 2024 March 27

Intravenous infusion of dexmedetomidine during the surgery to prevent postoperative delirium and postoperative cognitive dysfunction undergoing non-cardiac surgery: a meta-analysis of randomized controlled trials.Di Wang et al.European Journal of Medical Research 2024 April 19

The Tricuspid Valve: A Review of Pathology, Imaging, and Current Treatment Options: A Scientific Statement From the American Heart Association.Laura J Davidson et al.Circulation 2024 April 26

Consensus Statement on Vitamin D Status Assessment and Supplementation: Whys, Whens, and Hows.Andrea Giustina et al.Endocrine Reviews 2024 April 28

Management of Diverticulitis: A Review.Olivia A Sacks, Jason HallJAMA Surgery 2024 April 18

Interstitial Lung Disease: A Review.Toby M MaherJAMA 2024 April 23

For the best experience, use the Read mobile app

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

Efficient gene orthology inference via large-scale rearrangements.

Full text links

Related Resources

Trending Papers

For the best experience, use the Read mobile app