Most recent papers with the keyword Bioinformatics python

#1

JOURNAL ARTICLE

SynGenes: a Python class for standardizing nomenclatures of mitochondrial and chloroplast genes and a web form for enhancing searches for evolutionary analyses.

Luan Pinto Rabelo, Davidson Sodré, Rodrigo Petry Corrêa de Sousa, Luciana Watanabe, Grazielle Gomes, Iracilda Sampaio, Marcelo Vallinoto

BACKGROUND: The reconstruction of the evolutionary history of organisms has been greatly influenced by the advent of molecular techniques, leading to a significant increase in studies utilizing genomic data from different species. However, the lack of standardization in gene nomenclature poses a challenge in database searches and evolutionary analyses, impacting the accuracy of results obtained. RESULTS: To address this issue, a Python class for standardizing gene nomenclatures, SynGenes, has been developed...

38649820

April 22, 2024: BMC Bioinformatics

#2

JOURNAL ARTICLE

Noisecut: a python package for noise-tolerant classification of binary data using prior knowledge integration and max-cut solutions.

Moein E Samadi, Hedieh Mirzaieazar, Alexander Mitsos, Andreas Schuppert

BACKGROUND: Classification of binary data arises naturally in many clinical applications, such as patient risk stratification through ICD codes. One of the key practical challenges in data classification using machine learning is to avoid overfitting. Overfitting in supervised learning primarily occurs when a model learns random variations from noisy labels in training data rather than the underlying patterns. While traditional methods such as regularization and early stopping have demonstrated effectiveness in interpolation tasks, addressing overfitting in the classification of binary data, in which predictions always amount to extrapolation, demands extrapolation-enhanced strategies...

38641616

April 20, 2024: BMC Bioinformatics

#3

JOURNAL ARTICLE

TrieDedup: a fast trie-based deduplication algorithm to handle ambiguous bases in high-throughput sequencing.

Jianqiao Hu, Sai Luo, Ming Tian, Adam Yongxin Ye

BACKGROUND: High-throughput sequencing is a powerful tool that is extensively applied in biological studies. However, sequencers may produce low-quality bases, leading to ambiguous bases, 'N's. PCR duplicates introduced in library preparation are conventionally removed in genomics studies, and several deduplication tools have been developed for this purpose. Two identical reads may appear different due to ambiguous bases and the existing tools cannot address 'N's correctly or efficiently...

38637756

April 18, 2024: BMC Bioinformatics

#4

JOURNAL ARTICLE

Efficient cytometry analysis with FlowSOM in python boosts interoperability with other single-cell tools.

Artuur Couckuyt, Benjamin Rombaut, Yvan Saeys, Sofie Van Gassen

MOTIVATION: We describe a new Python implementation of FlowSOM, a clustering method for cytometry data. RESULTS: This implementation is faster than the original version in R, better adapted to work with single-cell omics data including integration with current single-cell data structures and includes all the original visualizations, such as the star and pie plot. AVAILABILITY: The FlowSOM Python implementation is freely available on GitHub: https://github...

38632080

April 17, 2024: Bioinformatics

#5

JOURNAL ARTICLE

Protocol for constructing glycan biosynthetic networks using glycowork.

Jon Lundstrøm, Luc Thomès, Daniel Bojar

Glycans, present across all domains of life, comprise a wide range of monosaccharides assembled into complex, branching structures. Here, we present an in silico protocol to construct biosynthetic networks from a list of observed glycans using the Python package glycowork. We describe steps for data preparation, network construction, feature analysis, and data export. This protocol is implemented in Python using example data and can be adapted for use with customized datasets. For complete details on the use and execution of this protocol, please refer to Thomès et al...

38630592

April 15, 2024: STAR protocols

#6

JOURNAL ARTICLE

PyPop: a mature open-source software pipeline for population genomics.

Alexander K Lancaster, Richard M Single, Steven J Mack, Vanessa Sochat, Michael P Mariani, Gordon D Webster

Python for Population Genomics (PyPop) is a software package that processes genotype and allele data and performs large-scale population genetic analyses on highly polymorphic multi-locus genotype data. In particular, PyPop tests data conformity to Hardy-Weinberg equilibrium expectations, performs Ewens-Watterson tests for selection, estimates haplotype frequencies, measures linkage disequilibrium, and tests significance. Standardized means of performing these tests is key for contemporary studies of evolutionary biology and population genetics, and these tests are central to genetic studies of disease association as well...

38629078

2024: Frontiers in Immunology

#7

JOURNAL ARTICLE

MarkerScan: Separation and assembly of cobionts sequenced alongside target species in biodiversity genomics projects.

Emmelien Vancaester, Mark L Blaxter

Contamination of public databases by mislabelled sequences has been highlighted for many years and the avalanche of novel sequencing data now being deposited has the potential to make databases difficult to use effectively. It is therefore crucial that sequencing projects and database curators perform pre-submission checks to remove obvious contamination and avoid propagating erroneous taxonomic relationships. However, it is important also to recognise that biological contamination of a target sample with unexpected species' DNA can also lead to the discovery of fascinating biological phenomena through the identification of environmental organisms or endosymbionts...

38617467

2024: Wellcome Open Research

#8

JOURNAL ARTICLE

Protocol to explain support vector machine predictions via exact Shapley value computation.

Andrea Mastropietro, Jürgen Bajorath

Shapley values from cooperative game theory are adapted for explaining machine learning predictions. For large feature sets used in machine learning, Shapley values are approximated. We present a protocol for two techniques for explaining support vector machine predictions with exact Shapley value computation. We detail the application of these algorithms and provide ready-to-use Python scripts and custom code. The final output of the protocol includes quantitative feature analysis and mapping of important features for visualization...

38607924

April 11, 2024: STAR protocols

#9

JOURNAL ARTICLE

scDAC: deep adaptive clustering of single-cell transcriptomic data with coupled autoencoder and dirichlet process mixture model.

Sijing An, Jinhui Shi, Runyan Liu, Yaowen Chen, Jing Wang, Shuofeng Hu, Xinyu Xia, Guohua Dong, Xiaochen Bo, Zhen He, Xiaomin Ying

MOTIVATION: Clustering analysis for single-cell RNA sequencing (scRNA-seq) data is an important step in revealing cellular heterogeneity. Many clustering methods have been proposed to discover heterogenous cell types from scRNA-seq data. However, adaptive clustering with accurate cluster number reflecting intrinsic biology nature from large-scale scRNA-seq data remains quite challenging. RESULTS: Here we propose a single-cell Deep Adaptive Clustering (scDAC) model by coupling the Autoencoder (AE) and the Dirichlet Process Mixture Model (DPMM)...

38603616

April 11, 2024: Bioinformatics

#10

JOURNAL ARTICLE

pyaging: a Python-based compendium of GPU-optimized aging clocks.

Lucas Paulo de Lima Camillo

MOTIVATION: Aging is intricately linked to diseases and mortality. It is reflected in molecular changes across various tissues which can be leveraged for the development of biomarkers of aging using machine learning models, known as aging clocks. Despite advancements in the field, a significant challenge remains: the lack of robust, Python-based software tools for integrating and comparing these diverse models. This gap highlights the need for comprehensive solutions that can handle the complexity and variety of data in aging research...

38603598

April 11, 2024: Bioinformatics

#11

JOURNAL ARTICLE

Protocol for unsupervised inference of cell-cell communication using matrix decomposition.

Yi Liu, Xiao Chang, Xiaoping Liu

Exploring cell-cell communication is pivotal for understanding biological processes in multicellular life forms. Here, we present a protocol that details the use of matrix decomposition to infer cell-cell communication (MDIC3) for unsupervised cell-cell communication inference. We describe steps for using the MDIC3 Python scripts to deduce cell-cell communication and identify key ligand-receptor pairs between a specific cell type pair from a single-cell gene expression dataset. This protocol has potential application in cell-cell communication inference on any species...

38602871

April 10, 2024: STAR protocols

#12

JOURNAL ARTICLE

VirusPredictor: XGBoost-based software to predict virus-related sequences in human data.

Guangchen Liu, Xun Chen, Yihui Luan, Dawei Li

MOTIVATION: Discovering disease causative pathogens, particularly viruses without reference genomes, poses a technical challenge as they are often unidentifiable through sequence alignment. Machine learning prediction of patient high-throughput sequences unmappable to human and pathogen genomes may reveal sequences originating from uncharacterized viruses. Currently, there is a lack of software specifically designed for accurately predicting such viral sequences in human data. RESULTS: We developed a fast XGBoost method and software VirusPredictor leveraging an in-house viral genome database...

38597887

April 10, 2024: Bioinformatics

#13

JOURNAL ARTICLE

EpiCarousel: memory- and time-efficient identification of metacells for atlas-level single-cell chromatin accessibility data.

Sijie Li, Yuxi Li, Yu Sun, Yaru Li, Xiaoyang Chen, Songming Tang, Shengquan Chen

SUMMARY: Recent technical advancements in single-cell chromatin accessibility sequencing (scCAS) have brought new insights to the characterization of epigenetic heterogeneity. As single-cell genomics experiments scale up to hundreds of thousands of cells, the demand for computational resources of downstream analysis grows intractably large and exceeds the capabilities of most researchers. Here, we propose EpiCarousel, a tailored Python package based on lazy loading, parallel processing, and community detection for memory- and time-efficient identification of metacells, i...

38588573

April 8, 2024: Bioinformatics

#14

JOURNAL ARTICLE

Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries.

Jochen Weile, Gabrielle Ferra, Gabriel Boyle, Sriram Pendyala, Clara Amorosi, Chiann-Ling Yeh, Atina G Cote, Nishka Kishore, Daniel Tabet, Warren van Loggerenberg, Ashyad Rayhan, Douglas M Fowler, Maitreya J Dunham, Frederick P Roth

MOTIVATION: Long read sequencing technologies, an attractive solution for many applications, often suffer from higher error rates. Alignment of multiple reads can improve base-calling accuracy, but some applications, e.g. sequencing mutagenized libraries where multiple distinct clones differ by one or few variants, require the use of barcodes or unique molecular identifiers. Unfortunately, sequencing errors can interfere with correct barcode identification, and a given barcode sequence may be linked to multiple independent clones within a given library...

38569896

April 3, 2024: Bioinformatics

#15

JOURNAL ARTICLE

3Dmapper: a command line tool for BioBank-scale mapping of variants to protein structures.

Victoria Ruiz-Serra, Samuel Valentini, Sergi Madroñero, Alfonso Valencia, Eduard Porta-Pardo

MOTIVATION: The interpretation of genomic data is crucial to understand the molecular mechanisms of biological processes. Protein structures play a vital role in facilitating this interpretation by providing functional context to genetic coding variants. However, mapping genes to proteins is a tedious and error-prone task due to inconsistencies in data formats. Over the past two decades, numerous tools and databases have been developed to automatically map annotated positions and variants to protein structures...

38565273

April 2, 2024: Bioinformatics

#16

JOURNAL ARTICLE

SEAOP: a statistical ensemble approach for outlier detection in quantitative proteomics data.

Jinze Huang, Yang Zhao, Bo Meng, Ao Lu, Yaoguang Wei, Lianhua Dong, Xiang Fang, Dong An, Xinhua Dai

Quality control in quantitative proteomics is a persistent challenge, particularly in identifying and managing outliers. Unsupervised learning models, which rely on data structure rather than predefined labels, offer potential solutions. However, without clear labels, their effectiveness might be compromised. Single models are susceptible to the randomness of parameters and initialization, which can result in a high rate of false positives. Ensemble models, on the other hand, have shown capabilities in effectively mitigating the impacts of such randomness and assisting in accurately detecting true outliers...

38557674

March 27, 2024: Briefings in Bioinformatics

#17

JOURNAL ARTICLE

MotGen: a closed-loop bacterial motility control framework using generative adversarial networks.

BoGeum Seo, DoHee Lee, Heungjin Jeon, Junhyoung Ha, SeungBeum Suh

MOTIVATION: Many organisms' survival and behavior hinge on their responses to environmental signals. While research on bacteria-directed therapeutic agents has increased, systematic exploration of real-time modulation of bacterial motility remains limited. Current studies often focus on permanent motility changes through genetic alterations, restricting the ability to modulate bacterial motility dynamically on a large scale. To address this gap, we propose a novel real-time control framework for systematically modulating bacterial motility dynamics...

38552318

March 29, 2024: Bioinformatics

#18

JOURNAL ARTICLE

PyCoM: a python library for large-scale analysis of residue-residue coevolution data.

Philipp Bibik, Sabriyeh Alibai, Alessandro Pandini, Sarath Chandra Dantu

MOTIVATION: Computational methods to detect correlated amino acid positions in proteins have become a valuable tool to predict intra and inter-residue protein contacts, protein structures, and effects of mutation on protein stability and function. While there are many tools and webservers to compute coevolution scoring matrices, there is no central repository of alignments and coevolution matrices for large-scale studies and pattern detection leveraging on structural and biological annotation already available in UniProt...

38532297

March 26, 2024: Bioinformatics

#19

JOURNAL ARTICLE

PyCoMo: a python package for community metabolic model creation and analysis.

Michael Predl, Marianne Mießkes, Thomas Rattei, Jürgen Zanghellini

SUMMARY: PyCoMo is a python package for quick and easy generation of genome-scale compartmentalised community metabolic models that are compliant with current openCOBRA file formats. The resulting models can be used to predict (i) the maximum growth rate at a given abundance profile, (ii) the feasible community compositions at a given growth rate, and (iii) all exchange metabolites and cross-feeding interactions in a community metabolic model independent of the abundance profile; we demonstrate PyCoMo's capability by analysing methane production in a previously published simplified biogas community metabolic model (Koch et al...

38532295

March 26, 2024: Bioinformatics

#20

JOURNAL ARTICLE

NIEND: Neuronal image enhancement through noise disentanglement.

Zuo-Han Zhao, Yufeng Liu

MOTIVATION: The full automation of digital neuronal reconstruction from light microscopic images has long been impeded by noisy neuronal images. Previous endeavors to improve image quality can hardly get a good compromise between robustness and computational efficiency. RESULTS: We present the image enhancement pipeline named Neuronal Image Enhancement through Noise Disentanglement (NIEND). Through extensive benchmarking on 863 mouse neuronal images with manually annotated gold standards, NIEND achieves remarkable improvements in image quality such as signal-background contrast (40-fold) and background uniformity (10-fold), compared to raw images...

38530800

March 26, 2024: Bioinformatics

Use the keywords feature with a free QxMD account.

Bioinformatics python

Save your favorite articles in one place with a free QxMD account.

Read

Search Tips