Most recent papers in the journal BMC Bioinformatics

#1

JOURNAL ARTICLE

Mahira Kirmani, Gagandeep Kour, Mudasir Mohd, Nasrullah Sheikh, Dawood Ashraf Khan, Zahid Maqbool, Mohsin Altaf Wani, Abid Hussain Wani

BACKGROUND: Text summarization is a challenging problem in Natural Language Processing, which involves condensing the content of textual documents without losing their overall meaning and information content, In the domain of bio-medical research, summaries are critical for efficient data analysis and information retrieval. While several bio-medical text summarizers exist in the literature, they often miss out on an essential text aspect: text semantics. RESULTS: This paper proposes a novel extractive summarizer that preserves text semantics by utilizing bio-semantic models...

38627652

April 16, 2024: BMC Bioinformatics

#2

JOURNAL ARTICLE

Inference of genomic landscapes using ordered Hidden Markov Models with emission densities (oHMMed).

Claus Vogl, Mariia Karapetiants, Burçin Yıldırım, Hrönn Kjartansdóttir, Carolin Kosiol, Juraj Bergman, Michal Majka, Lynette Caitlin Mikula

BACKGROUND: Genomes are inherently inhomogeneous, with features such as base composition, recombination, gene density, and gene expression varying along chromosomes. Evolutionary, biological, and biomedical analyses aim to quantify this variation, account for it during inference procedures, and ultimately determine the causal processes behind it. Since sequential observations along chromosomes are not independent, it is unsurprising that autocorrelation patterns have been observed e.g...

38627634

April 16, 2024: BMC Bioinformatics

#3

JOURNAL ARTICLE

MetageNN: a memory-efficient neural network taxonomic classifier robust to sequencing errors and missing genomes.

Rafael Peres da Silva, Chayaporn Suphavilai, Niranjan Nagarajan

BACKGROUND: With the rapid increase in throughput of long-read sequencing technologies, recent studies have explored their potential for taxonomic classification by using alignment-based approaches to reduce the impact of higher sequencing error rates. While alignment-based methods are generally slower, k-mer-based taxonomic classifiers can overcome this limitation, potentially at the expense of lower sensitivity for strains and species that are not in the database. RESULTS: We present MetageNN, a memory-efficient long-read taxonomic classifier that is robust to sequencing errors and missing genomes...

38627615

April 16, 2024: BMC Bioinformatics

#4

JOURNAL ARTICLE

Designing and delivering bioinformatics project-based learning in East Africa.

Caleb K Kibet, Jean-Baka Domelevo Entfellner, Daudi Jjingo, Etienne Pierre de Villiers, Santie de Villiers, Karen Wambui, Sam Kinyanjui, Daniel Masiga

BACKGROUND: The Eastern Africa Network for Bioinformatics Training (EANBiT) has matured through continuous evaluation, feedback, and codesign. We highlight how the program has evolved to meet challenges and achieve its goals and how experiential learning through mini projects enhances the acquisition of skills and collaboration. We continued to learn and grow through honest feedback and evaluation of the program, trainers, and modules, enabling us to provide robust training even during the Coronavirus disease 2019 (COVID-19) pandemic, when we had to redesign the program due to restricted travel and in person group meetings...

38616247

April 14, 2024: BMC Bioinformatics

#5

JOURNAL ARTICLE

MultiToxPred 1.0: a novel comprehensive tool for predicting 27 classes of protein toxins using an ensemble machine learning approach.

Jorge F Beltrán, Lisandra Herrera-Belén, Fernanda Parraguez-Contreras, Jorge G Farías, Jorge Machuca-Sepúlveda, Stefania Short

Protein toxins are defense mechanisms and adaptations found in various organisms and microorganisms, and their use in scientific research as therapeutic candidates is gaining relevance due to their effectiveness and specificity against cellular targets. However, discovering these toxins is time-consuming and expensive. In silico tools, particularly those based on machine learning and deep learning, have emerged as valuable resources to address this challenge. Existing tools primarily focus on binary classification, determining whether a protein is a toxin or not, and occasionally identifying specific types of toxins...

38609877

April 12, 2024: BMC Bioinformatics

#6

JOURNAL ARTICLE

Biomarker discovery with quantum neural networks: a case-study in CTLA4-activation pathways.

Phuong-Nam Nguyen

BACKGROUND: Biomarker discovery is a challenging task due to the massive search space. Quantum computing and quantum Artificial Intelligence (quantum AI) can be used to address the computational problem of biomarker discovery from genetic data. METHOD: We propose a Quantum Neural Networks architecture to discover genetic biomarkers for input activation pathways. The Maximum Relevance-Minimum Redundancy criteria score biomarker candidate sets. Our proposed model is economical since the neural solution can be delivered on constrained hardware...

38609844

April 12, 2024: BMC Bioinformatics

#7

JOURNAL ARTICLE

Control of false discoveries in grouped hypothesis testing for eQTL data.

Pratyaydipta Rudra, Yi-Hui Zhou, Andrew Nobel, Fred A Wright

BACKGROUND: Expression quantitative trait locus (eQTL) analysis aims to detect the genetic variants that influence the expression of one or more genes. Gene-level eQTL testing forms a natural grouped-hypothesis testing strategy with clear biological importance. Methods to control family-wise error rate or false discovery rate for group testing have been proposed earlier, but may not be powerful or easily apply to eQTL data, for which certain structured alternatives may be defensible and may enable the researcher to avoid overly conservative approaches...

38605284

April 11, 2024: BMC Bioinformatics

#8

JOURNAL ARTICLE

KEGG orthology prediction of bacterial proteins using natural language processing.

Jing Chen, Haoyu Wu, Ning Wang

BACKGROUND: The advent of high-throughput technologies has led to an exponential increase in uncharacterized bacterial protein sequences, surpassing the capacity of manual curation. A large number of bacterial protein sequences remain unannotated by Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology, making it necessary to use auto annotation tools. These tools are now indispensable in the biological research landscape, bridging the gap between the vastness of unannotated sequences and meaningful biological insights...

38600441

April 11, 2024: BMC Bioinformatics

#9

JOURNAL ARTICLE

DPI_CDF: druggable protein identifier using cascade deep forest.

Muhammad Arif, Ge Fang, Ali Ghulam, Saleh Musleh, Tanvir Alam

BACKGROUND: Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedite the large-scale identification of druggable proteins (DPs); however, the existing in silico predictor's performance is still not satisfactory. METHODS: In this study, we developed a novel deep learning-based model DPI_CDF for predicting DPs based on protein sequence only...

38580921

April 5, 2024: BMC Bioinformatics

#10

JOURNAL ARTICLE

Multiple phenotype association tests based on sliced inverse regression.

Wenyuan Sun, Kyongson Jon, Wensheng Zhu

BACKGROUND: Joint analysis of multiple phenotypes in studies of biological systems such as Genome-Wide Association Studies is critical to revealing the functional interactions between various traits and genetic variants, but growth of data in dimensionality has become a very challenging problem in the widespread use of joint analysis. To handle the excessiveness of variables, we consider the sliced inverse regression (SIR) method. Specifically, we propose a novel SIR-based association test that is robust and powerful in testing the association between multiple predictors and multiple outcomes...

38575890

April 4, 2024: BMC Bioinformatics

#11

JOURNAL ARTICLE

Predicting condensate formation of protein and RNA under various environmental conditions.

Ka Yin Chin, Shoichi Ishida, Yukio Sasaki, Kei Terayama

BACKGROUND: Liquid-liquid phase separation (LLPS) by biomolecules plays a central role in various biological phenomena and has garnered significant attention. The behavior of LLPS is strongly influenced by the characteristics of RNAs and environmental factors such as pH and temperature, as well as the properties of proteins. Recently, several databases recording LLPS-related biomolecules have been established, and prediction models of LLPS-related phenomena have been explored using these databases...

38566033

April 2, 2024: BMC Bioinformatics

#12

JOURNAL ARTICLE

CITEViz: interactively classify cell populations in CITE-Seq via a flow cytometry-like gating workflow using R-Shiny.

Garth L Kong, Thai T Nguyen, Wesley K Rosales, Anjali D Panikar, John H W Cheney, Theresa A Lusardi, William M Yashar, Brittany M Curtiss, Sarah A Carratt, Theodore P Braun, Julia E Maxson

BACKGROUND: The rapid advancement of new genomic sequencing technology has enabled the development of multi-omic single-cell sequencing assays. These assays profile multiple modalities in the same cell and can often yield new insights not revealed with a single modality. For example, Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-Seq) simultaneously profiles the RNA transcriptome and the surface protein expression. The surface protein markers in CITE-Seq can be used to identify cell populations similar to the iterative filtration process in flow cytometry, also called "gating", and is an essential step for downstream analyses and data interpretation...

38566005

April 2, 2024: BMC Bioinformatics

#13

JOURNAL ARTICLE

CAT-DTI: cross-attention and Transformer network with domain adaptation for drug-target interaction prediction.

Xiaoting Zeng, Weilin Chen, Baiying Lei

Accurate and efficient prediction of drug-target interaction (DTI) is critical to advance drug development and reduce the cost of drug discovery. Recently, the employment of deep learning methods has enhanced DTI prediction precision and efficacy, but it still encounters several challenges. The first challenge lies in the efficient learning of drug and protein feature representations alongside their interaction features to enhance DTI prediction. Another important challenge is to improve the generalization capability of the DTI model within real-world scenarios...

38566002

April 2, 2024: BMC Bioinformatics

#14

JOURNAL ARTICLE

MFSynDCP: multi-source feature collaborative interactive learning for drug combination synergy prediction.

Yunyun Dong, Yunqing Chang, Yuxiang Wang, Qixuan Han, Xiaoyuan Wen, Ziting Yang, Yan Zhang, Yan Qiang, Kun Wu, Xiaole Fan, Xiaoqiang Ren

Drug combination therapy is generally more effective than monotherapy in the field of cancer treatment. However, screening for effective synergistic combinations from a wide range of drug combinations is particularly important given the increase in the number of available drug classes and potential drug-drug interactions. Existing methods for predicting the synergistic effects of drug combinations primarily focus on extracting structural features of drug molecules and cell lines, but neglect the interaction mechanisms between cell lines and drug combinations...

38561679

April 1, 2024: BMC Bioinformatics

#15

JOURNAL ARTICLE

DAE-CFR: detecting microRNA-disease associations using deep autoencoder and combined feature representation.

Yanling Liu, Ruiyan Zhang, Xiaojing Dong, Hong Yang, Jing Li, Hongyan Cao, Jing Tian, Yanbo Zhang

BACKGROUND: MicroRNA (miRNA) has been shown to play a key role in the occurrence and progression of diseases, making uncovering miRNA-disease associations vital for disease prevention and therapy. However, traditional laboratory methods for detecting these associations are slow, strenuous, expensive, and uncertain. Although numerous advanced algorithms have emerged, it is still a challenge to develop more effective methods to explore underlying miRNA-disease associations. RESULTS: In the study, we designed a novel approach on the basis of deep autoencoder and combined feature representation (DAE-CFR) to predict possible miRNA-disease associations...

38553698

March 29, 2024: BMC Bioinformatics

#16

JOURNAL ARTICLE

Curare and GenExVis: a versatile toolkit for analyzing and visualizing RNA-Seq data.

Patrick Blumenkamp, Max Pfister, Sonja Diedrich, Karina Brinkrolf, Sebastian Jaenicke, Alexander Goesmann

Even though high-throughput transcriptome sequencing is routinely performed in many laboratories, computational analysis of such data remains a cumbersome process often executed manually, hence error-prone and lacking reproducibility. For corresponding data processing, we introduce Curare, an easy-to-use yet versatile workflow builder for analyzing high-throughput RNA-Seq data focusing on differential gene expression experiments. Data analysis with Curare is customizable and subdivided into preprocessing, quality control, mapping, and downstream analysis stages, providing multiple options for each step while ensuring the reproducibility of the workflow...

38553675

March 29, 2024: BMC Bioinformatics

#17

JOURNAL ARTICLE

Towards a unified medical microbiome ecology of the OMU for metagenomes and the OTU for microbes.

Zhanshan Sam Ma

BACKGROUND: Metagenomic sequencing technologies offered unprecedented opportunities and also challenges to microbiology and microbial ecology particularly. The technology has revolutionized the studies of microbes and enabled the high-profile human microbiome and earth microbiome projects. The terminology-change from microbes to microbiomes signals that our capability to count and classify microbes (microbiomes) has achieved the same or similar level as we can for the biomes (macrobiomes) of plants and animals (macrobes)...

38553666

March 29, 2024: BMC Bioinformatics

#18

JOURNAL ARTICLE

Feature-specific quantile normalization and feature-specific mean-variance normalization deliver robust bi-directional classification and feature selection performance between microarray and RNAseq data.

Daniel Skubleny, Sunita Ghosh, Jennifer Spratlin, Daniel E Schiller, Gina R Rayat

BACKGROUND: Cross-platform normalization seeks to minimize technological bias between microarray and RNAseq whole-transcriptome data. Incorporating multiple gene expression platforms permits external validation of experimental findings, and augments training sets for machine learning models. Here, we compare the performance of Feature Specific Quantile Normalization (FSQN) to a previously used but unvalidated and uncharacterized method we label as Feature Specific Mean Variance Normalization (FSMVN)...

38549046

March 29, 2024: BMC Bioinformatics

#19

JOURNAL ARTICLE

GraphKM: machine and deep learning for K M prediction of wildtype and mutant enzymes.

Xiao He, Ming Yan

Michaelis constant (KM ) is one of essential parameters for enzymes kinetics in the fields of protein engineering, enzyme engineering, and synthetic biology. As overwhelming experimental measurements of KM are difficult and time-consuming, prediction of the KM values from machine and deep learning models would increase the pace of the enzymes kinetics studies. Existing machine and deep learning models are limited to the specific enzymes, i.e., a minority of enzymes or wildtype enzymes. Here, we used a deep learning framework PaddlePaddle to implement a machine and deep learning approach (GraphKM) for KM prediction of wildtype and mutant enzymes...

38549073

March 28, 2024: BMC Bioinformatics

#20

JOURNAL ARTICLE

SurvConvMixer: robust and interpretable cancer survival prediction based on ConvMixer using pathway-level gene expression images.

Shuo Wang, Yuanning Liu, Hao Zhang, Zhen Liu

Cancer is one of the leading causes of deaths worldwide. Survival analysis and prediction of cancer patients is of great significance for their precision medicine. The robustness and interpretability of the survival prediction models are important, where robustness tells whether a model has learned the knowledge, and interpretability means if a model can show human what it has learned. In this paper, we propose a robust and interpretable model SurvConvMixer, which uses pathways customized gene expression images and ConvMixer for cancer short-term, mid-term and long-term overall survival prediction...

38539106

March 27, 2024: BMC Bioinformatics

Use the journals feature with a free QxMD account.

BMC Bioinformatics

Save your favorite articles in one place with a free QxMD account.

Read

Search Tips