keyword
https://read.qxmd.com/read/38649820/syngenes-a-python-class-for-standardizing-nomenclatures-of-mitochondrial-and-chloroplast-genes-and-a-web-form-for-enhancing-searches-for-evolutionary-analyses
#1
JOURNAL ARTICLE
Luan Pinto Rabelo, Davidson Sodré, Rodrigo Petry Corrêa de Sousa, Luciana Watanabe, Grazielle Gomes, Iracilda Sampaio, Marcelo Vallinoto
BACKGROUND: The reconstruction of the evolutionary history of organisms has been greatly influenced by the advent of molecular techniques, leading to a significant increase in studies utilizing genomic data from different species. However, the lack of standardization in gene nomenclature poses a challenge in database searches and evolutionary analyses, impacting the accuracy of results obtained. RESULTS: To address this issue, a Python class for standardizing gene nomenclatures, SynGenes, has been developed...
April 22, 2024: BMC Bioinformatics
https://read.qxmd.com/read/38641616/noisecut-a-python-package-for-noise-tolerant-classification-of-binary-data-using-prior-knowledge-integration-and-max-cut-solutions
#2
JOURNAL ARTICLE
Moein E Samadi, Hedieh Mirzaieazar, Alexander Mitsos, Andreas Schuppert
BACKGROUND: Classification of binary data arises naturally in many clinical applications, such as patient risk stratification through ICD codes. One of the key practical challenges in data classification using machine learning is to avoid overfitting. Overfitting in supervised learning primarily occurs when a model learns random variations from noisy labels in training data rather than the underlying patterns. While traditional methods such as regularization and early stopping have demonstrated effectiveness in interpolation tasks, addressing overfitting in the classification of binary data, in which predictions always amount to extrapolation, demands extrapolation-enhanced strategies...
April 20, 2024: BMC Bioinformatics
https://read.qxmd.com/read/38637756/triededup-a-fast-trie-based-deduplication-algorithm-to-handle-ambiguous-bases-in-high-throughput-sequencing
#3
JOURNAL ARTICLE
Jianqiao Hu, Sai Luo, Ming Tian, Adam Yongxin Ye
BACKGROUND: High-throughput sequencing is a powerful tool that is extensively applied in biological studies. However, sequencers may produce low-quality bases, leading to ambiguous bases, 'N's. PCR duplicates introduced in library preparation are conventionally removed in genomics studies, and several deduplication tools have been developed for this purpose. Two identical reads may appear different due to ambiguous bases and the existing tools cannot address 'N's correctly or efficiently...
April 18, 2024: BMC Bioinformatics
https://read.qxmd.com/read/38632080/efficient-cytometry-analysis-with-flowsom-in-python-boosts-interoperability-with-other-single-cell-tools
#4
JOURNAL ARTICLE
Artuur Couckuyt, Benjamin Rombaut, Yvan Saeys, Sofie Van Gassen
MOTIVATION: We describe a new Python implementation of FlowSOM, a clustering method for cytometry data. RESULTS: This implementation is faster than the original version in R, better adapted to work with single-cell omics data including integration with current single-cell data structures and includes all the original visualizations, such as the star and pie plot. AVAILABILITY: The FlowSOM Python implementation is freely available on GitHub: https://github...
April 17, 2024: Bioinformatics
https://read.qxmd.com/read/38630592/protocol-for-constructing-glycan-biosynthetic-networks-using-glycowork
#5
JOURNAL ARTICLE
Jon Lundstrøm, Luc Thomès, Daniel Bojar
Glycans, present across all domains of life, comprise a wide range of monosaccharides assembled into complex, branching structures. Here, we present an in silico protocol to construct biosynthetic networks from a list of observed glycans using the Python package glycowork. We describe steps for data preparation, network construction, feature analysis, and data export. This protocol is implemented in Python using example data and can be adapted for use with customized datasets. For complete details on the use and execution of this protocol, please refer to Thomès et al...
April 15, 2024: STAR protocols
https://read.qxmd.com/read/38629078/pypop-a-mature-open-source-software-pipeline-for-population-genomics
#6
JOURNAL ARTICLE
Alexander K Lancaster, Richard M Single, Steven J Mack, Vanessa Sochat, Michael P Mariani, Gordon D Webster
Python for Population Genomics (PyPop) is a software package that processes genotype and allele data and performs large-scale population genetic analyses on highly polymorphic multi-locus genotype data. In particular, PyPop tests data conformity to Hardy-Weinberg equilibrium expectations, performs Ewens-Watterson tests for selection, estimates haplotype frequencies, measures linkage disequilibrium, and tests significance. Standardized means of performing these tests is key for contemporary studies of evolutionary biology and population genetics, and these tests are central to genetic studies of disease association as well...
2024: Frontiers in Immunology
https://read.qxmd.com/read/38617467/markerscan-separation-and-assembly-of-cobionts-sequenced-alongside-target-species-in-biodiversity-genomics-projects
#7
JOURNAL ARTICLE
Emmelien Vancaester, Mark L Blaxter
Contamination of public databases by mislabelled sequences has been highlighted for many years and the avalanche of novel sequencing data now being deposited has the potential to make databases difficult to use effectively. It is therefore crucial that sequencing projects and database curators perform pre-submission checks to remove obvious contamination and avoid propagating erroneous taxonomic relationships. However, it is important also to recognise that biological contamination of a target sample with unexpected species' DNA can also lead to the discovery of fascinating biological phenomena through the identification of environmental organisms or endosymbionts...
2024: Wellcome Open Research
https://read.qxmd.com/read/38607924/protocol-to-explain-support-vector-machine-predictions-via-exact-shapley-value-computation
#8
JOURNAL ARTICLE
Andrea Mastropietro, Jürgen Bajorath
Shapley values from cooperative game theory are adapted for explaining machine learning predictions. For large feature sets used in machine learning, Shapley values are approximated. We present a protocol for two techniques for explaining support vector machine predictions with exact Shapley value computation. We detail the application of these algorithms and provide ready-to-use Python scripts and custom code. The final output of the protocol includes quantitative feature analysis and mapping of important features for visualization...
April 11, 2024: STAR protocols
https://read.qxmd.com/read/38603616/scdac-deep-adaptive-clustering-of-single-cell-transcriptomic-data-with-coupled-autoencoder-and-dirichlet-process-mixture-model
#9
JOURNAL ARTICLE
Sijing An, Jinhui Shi, Runyan Liu, Yaowen Chen, Jing Wang, Shuofeng Hu, Xinyu Xia, Guohua Dong, Xiaochen Bo, Zhen He, Xiaomin Ying
MOTIVATION: Clustering analysis for single-cell RNA sequencing (scRNA-seq) data is an important step in revealing cellular heterogeneity. Many clustering methods have been proposed to discover heterogenous cell types from scRNA-seq data. However, adaptive clustering with accurate cluster number reflecting intrinsic biology nature from large-scale scRNA-seq data remains quite challenging. RESULTS: Here we propose a single-cell Deep Adaptive Clustering (scDAC) model by coupling the Autoencoder (AE) and the Dirichlet Process Mixture Model (DPMM)...
April 11, 2024: Bioinformatics
https://read.qxmd.com/read/38603598/pyaging-a-python-based-compendium-of-gpu-optimized-aging-clocks
#10
JOURNAL ARTICLE
Lucas Paulo de Lima Camillo
MOTIVATION: Aging is intricately linked to diseases and mortality. It is reflected in molecular changes across various tissues which can be leveraged for the development of biomarkers of aging using machine learning models, known as aging clocks. Despite advancements in the field, a significant challenge remains: the lack of robust, Python-based software tools for integrating and comparing these diverse models. This gap highlights the need for comprehensive solutions that can handle the complexity and variety of data in aging research...
April 11, 2024: Bioinformatics
https://read.qxmd.com/read/38602871/protocol-for-unsupervised-inference-of-cell-cell-communication-using-matrix-decomposition
#11
JOURNAL ARTICLE
Yi Liu, Xiao Chang, Xiaoping Liu
Exploring cell-cell communication is pivotal for understanding biological processes in multicellular life forms. Here, we present a protocol that details the use of matrix decomposition to infer cell-cell communication (MDIC3) for unsupervised cell-cell communication inference. We describe steps for using the MDIC3 Python scripts to deduce cell-cell communication and identify key ligand-receptor pairs between a specific cell type pair from a single-cell gene expression dataset. This protocol has potential application in cell-cell communication inference on any species...
April 10, 2024: STAR protocols
https://read.qxmd.com/read/38597887/viruspredictor-xgboost-based-software-to-predict-virus-related-sequences-in-human-data
#12
JOURNAL ARTICLE
Guangchen Liu, Xun Chen, Yihui Luan, Dawei Li
MOTIVATION: Discovering disease causative pathogens, particularly viruses without reference genomes, poses a technical challenge as they are often unidentifiable through sequence alignment. Machine learning prediction of patient high-throughput sequences unmappable to human and pathogen genomes may reveal sequences originating from uncharacterized viruses. Currently, there is a lack of software specifically designed for accurately predicting such viral sequences in human data. RESULTS: We developed a fast XGBoost method and software VirusPredictor leveraging an in-house viral genome database...
April 10, 2024: Bioinformatics
https://read.qxmd.com/read/38588573/epicarousel-memory-and-time-efficient-identification-of-metacells-for-atlas-level-single-cell-chromatin-accessibility-data
#13
JOURNAL ARTICLE
Sijie Li, Yuxi Li, Yu Sun, Yaru Li, Xiaoyang Chen, Songming Tang, Shengquan Chen
SUMMARY: Recent technical advancements in single-cell chromatin accessibility sequencing (scCAS) have brought new insights to the characterization of epigenetic heterogeneity. As single-cell genomics experiments scale up to hundreds of thousands of cells, the demand for computational resources of downstream analysis grows intractably large and exceeds the capabilities of most researchers. Here, we propose EpiCarousel, a tailored Python package based on lazy loading, parallel processing, and community detection for memory- and time-efficient identification of metacells, i...
April 8, 2024: Bioinformatics
https://read.qxmd.com/read/38569896/pacybara-accurate-long-read-sequencing-for-barcoded-mutagenized-allelic-libraries
#14
JOURNAL ARTICLE
Jochen Weile, Gabrielle Ferra, Gabriel Boyle, Sriram Pendyala, Clara Amorosi, Chiann-Ling Yeh, Atina G Cote, Nishka Kishore, Daniel Tabet, Warren van Loggerenberg, Ashyad Rayhan, Douglas M Fowler, Maitreya J Dunham, Frederick P Roth
MOTIVATION: Long read sequencing technologies, an attractive solution for many applications, often suffer from higher error rates. Alignment of multiple reads can improve base-calling accuracy, but some applications, e.g. sequencing mutagenized libraries where multiple distinct clones differ by one or few variants, require the use of barcodes or unique molecular identifiers. Unfortunately, sequencing errors can interfere with correct barcode identification, and a given barcode sequence may be linked to multiple independent clones within a given library...
April 3, 2024: Bioinformatics
https://read.qxmd.com/read/38565273/3dmapper-a-command-line-tool-for-biobank-scale-mapping-of-variants-to-protein-structures
#15
JOURNAL ARTICLE
Victoria Ruiz-Serra, Samuel Valentini, Sergi Madroñero, Alfonso Valencia, Eduard Porta-Pardo
MOTIVATION: The interpretation of genomic data is crucial to understand the molecular mechanisms of biological processes. Protein structures play a vital role in facilitating this interpretation by providing functional context to genetic coding variants. However, mapping genes to proteins is a tedious and error-prone task due to inconsistencies in data formats. Over the past two decades, numerous tools and databases have been developed to automatically map annotated positions and variants to protein structures...
April 2, 2024: Bioinformatics
https://read.qxmd.com/read/38557674/seaop-a-statistical-ensemble-approach-for-outlier-detection-in-quantitative-proteomics-data
#16
JOURNAL ARTICLE
Jinze Huang, Yang Zhao, Bo Meng, Ao Lu, Yaoguang Wei, Lianhua Dong, Xiang Fang, Dong An, Xinhua Dai
Quality control in quantitative proteomics is a persistent challenge, particularly in identifying and managing outliers. Unsupervised learning models, which rely on data structure rather than predefined labels, offer potential solutions. However, without clear labels, their effectiveness might be compromised. Single models are susceptible to the randomness of parameters and initialization, which can result in a high rate of false positives. Ensemble models, on the other hand, have shown capabilities in effectively mitigating the impacts of such randomness and assisting in accurately detecting true outliers...
March 27, 2024: Briefings in Bioinformatics
https://read.qxmd.com/read/38552318/motgen-a-closed-loop-bacterial-motility-control-framework-using-generative-adversarial-networks
#17
JOURNAL ARTICLE
BoGeum Seo, DoHee Lee, Heungjin Jeon, Junhyoung Ha, SeungBeum Suh
MOTIVATION: Many organisms' survival and behavior hinge on their responses to environmental signals. While research on bacteria-directed therapeutic agents has increased, systematic exploration of real-time modulation of bacterial motility remains limited. Current studies often focus on permanent motility changes through genetic alterations, restricting the ability to modulate bacterial motility dynamically on a large scale. To address this gap, we propose a novel real-time control framework for systematically modulating bacterial motility dynamics...
March 29, 2024: Bioinformatics
https://read.qxmd.com/read/38532297/pycom-a-python-library-for-large-scale-analysis-of-residue-residue-coevolution-data
#18
JOURNAL ARTICLE
Philipp Bibik, Sabriyeh Alibai, Alessandro Pandini, Sarath Chandra Dantu
MOTIVATION: Computational methods to detect correlated amino acid positions in proteins have become a valuable tool to predict intra and inter-residue protein contacts, protein structures, and effects of mutation on protein stability and function. While there are many tools and webservers to compute coevolution scoring matrices, there is no central repository of alignments and coevolution matrices for large-scale studies and pattern detection leveraging on structural and biological annotation already available in UniProt...
March 26, 2024: Bioinformatics
https://read.qxmd.com/read/38532295/pycomo-a-python-package-for-community-metabolic-model-creation-and-analysis
#19
JOURNAL ARTICLE
Michael Predl, Marianne Mießkes, Thomas Rattei, Jürgen Zanghellini
SUMMARY: PyCoMo is a python package for quick and easy generation of genome-scale compartmentalised community metabolic models that are compliant with current openCOBRA file formats. The resulting models can be used to predict (i) the maximum growth rate at a given abundance profile, (ii) the feasible community compositions at a given growth rate, and (iii) all exchange metabolites and cross-feeding interactions in a community metabolic model independent of the abundance profile; we demonstrate PyCoMo's capability by analysing methane production in a previously published simplified biogas community metabolic model (Koch et al...
March 26, 2024: Bioinformatics
https://read.qxmd.com/read/38530800/niend-neuronal-image-enhancement-through-noise-disentanglement
#20
JOURNAL ARTICLE
Zuo-Han Zhao, Yufeng Liu
MOTIVATION: The full automation of digital neuronal reconstruction from light microscopic images has long been impeded by noisy neuronal images. Previous endeavors to improve image quality can hardly get a good compromise between robustness and computational efficiency. RESULTS: We present the image enhancement pipeline named Neuronal Image Enhancement through Noise Disentanglement (NIEND). Through extensive benchmarking on 863 mouse neuronal images with manually annotated gold standards, NIEND achieves remarkable improvements in image quality such as signal-background contrast (40-fold) and background uniformity (10-fold), compared to raw images...
March 26, 2024: Bioinformatics
keyword
keyword
66424
1
2
Fetch more papers »
Fetching more papers... Fetching...
Remove bar
Read by QxMD icon Read
×

Save your favorite articles in one place with a free QxMD account.

×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"

We want to hear from doctors like you!

Take a second to answer a survey question.