Read by QxMD icon Read

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Ivo Hedtke, Ioana Lemnian, Ivo Grosse, Matthias Muller-Hannemann
Read trimming is a fundamental first step of the analysis of next generation sequencing (NGS) data. Traditionally, it is performed heuristically, and algorithmic work in this area has been neglected. Here, we address this topic and formulate three optimization problems for block-based trimming (truncating the same low-quality positions at both ends for all reads and removing low-quality truncated reads). We find that all problems are NP-hard. Hence, we investigate the approximability of the problems. Two of them are NP-hard to approximate...
April 24, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Carl Poirier, Benoit Gosselin, Paul Fortier
This paper presents an FPGA implementation of a DNA assembly algorithm, called Ray, initially developed to run on parallel CPUs. The OpenCL language is used and the focus is placed on modifying and optimizing the original algorithm to better suit the new parallelization tool and the radically different hardware architecture. The results show that the execution time is roughly one fourth that of the CPU and factoring energy consumption yields a tenfold savings.
April 24, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Najmul Ikram, Muhammad Qadir, Muhammad Afzal
Sequence similarity is a commonly used measure to compare proteins. With the increasing use of ontologies, semantic (function) similarity is getting importance. The correlation between these measures has been applied in the evaluation of new semantic similarity methods, and in protein function prediction. In this research, we investigate the relationship between the two similarity methods. The results suggest absence of a strong correlation between sequence and semantic similarities. There is a large number of proteins with low sequence similarity and high semantic similarity...
April 18, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Davide Chicco, Fernando Palluzzi, Marco Masseroli
Biomolecular controlled annotations have become pivotal in computational biology, because they allow scientists to analyze large amounts of biological data to better understand test results, and to infer new knowledge. Yet, biomolecular annotation databases are incomplete by definition, like our knowledge of biology, and might contain errors and inconsistent information. In this context, machine-learning algorithms able to predict and prioritize new annotations are both effective and efficient, especially if compared with time-consuming trials of biological validation...
April 18, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Jie Zhou, Yuan-Yuan Shi
Phenotypes and diseases are often determined by the complex interactions between genetic factors and environmental factors (EFs). However, compared with protein-coding genes and microRNAs, there is a paucity of computational methods for understanding the associations between long non-coding RNAs (lncRNAs) and EFs. In this study, we focused on the associations between lncRNA and EFs. By using the common miRNA partners of any pair of lncRNA and EF, based on the competing endogenous RNA (ceRNA) hypothesis and the technique of resources transfer within the experimentally-supported lncRNA-miRNA and miRNA-EF association bipartite networks, we propose an algorithm for predicting new lncRNA-EF associations...
April 18, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Pedro Alves, Shuang Liu, Daifeng Wang, Mark Gerstein
Machine learning is an integral part of computational biology, and has already shown its use in various applications, such as prognostic tests. In the last few years in the non-biological machine learning community, ensembling techniques have shown their power in data mining competitions such as the Netflix challenge; however, such methods have not found wide use in computational biology. In this work we endeavor to show how ensembling techniques can be applied to practical problems, including problems in the field of bioinformatics, and how they often outperform other machine learning techniques in both predictive power and robustness...
April 5, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Lin Zhu, Hongbo Zhang, De-Shuang Huang
Although discriminative motif discovery (DMD) methods are promising for eliciting motifs from high-throughput experimental data, they usually have to sacrifice accuracy and may fail to fully leverage the potential of large datasets. Recently, it has been demonstrated that the motifs identified by DMDs can be significantly improved by maximizing the receiver-operating characteristic curve (AUC) metric, which has been widely used in the literature to rank the performance of elicited motifs. However, existing approaches for motif refinement choose to directly maximize the non-convex and discontinuous AUC itself, which is known to be difficult and may lead to suboptimal solutions...
April 5, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Somaya Hashem, Gamal Esmat, Wafaa Elakel, Shahira Habashy, Safaa Abdel Raouf, Mohamed Elhefnawi, Mohamed Eladawy, Mahmoud Elhefnawi
BACKGROUND/AIM: Using machine learning approaches as non-invasive methods have been used recently as an alternative method in staging chronic liver diseases for avoiding the drawbacks of biopsy. This study aims to evaluate different machine learning techniques in prediction of advanced fibrosis by combining the serum bio-markers and clinical information to develop the classification models. METHODS: A prospective cohort of 39,567 patients with chronic hepatitis C was divided into two sets - one categorized as mild to moderate fibrosis (F0-F2), and the other categorized as advanced fibrosis (F3-F4) according to METAVIR score...
April 4, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Xichuan Zhou, Fan Yang, Yujie Feng, Qin Li, Fang Tang, Shengdong Hu, Zhi Lin, Lei Zhang
The 2009 influenza pandemic teaches us how fast the influenza virus could spread globally within a short period of time. To address the challenge of timely global influenza surveillance, this paper presents a spatial-temporal method that incorporates heterogeneous data collected from the Internet to detect influenza epidemics in real time. Specifically, the influenza morbidity data, the influenza-related Google query data and news data, and the international air transportation data are integrated in a multivariate hidden Markov model, which is designed to describe the intrinsic temporal-geographical correlation of influenza transmission for surveillance purpose...
April 4, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Xianpeng Liang, Lin Zhu, De-Shuang Huang
Gene set enrichment (GSE) is a useful tool for analyzing and interpreting large molecular datasets generated by modern biomedical science. The accuracy and reproducibility of GSE analysis are heavily affected by the quality and integrity of gene sets annotations. In this paper, we propose a novel method, robust trace-norm multitask learning, to solve the optimization problem of gene set annotations. Inspired by the binary nature of annotations, we convert the optimization of gene set annotations into a weakly supervised classification problem and use discriminative logistic regression to fit these datasets...
April 3, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Xi Yang, Guoqiang Han, Hongmin Cai, Yan Song
Revealing data with intrinsically diagonal block structures is particularly useful for analyzing groups of highly correlated variables. Earlier researches based on non-negative matrix factorization (NMF) have been shown to be effective in representing such data by decomposing the observed data into two factors, where one factor is considered to be the feature and the other the expansion loading from a linear algebra perspective. If the data are sampled from multiple independent subspaces, the loading factor would possess a diagonal structure under an ideal matrix decomposition...
March 31, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Pritha Dutta, Subhadip Basu, Mahantapas Kundu
The semantic similarity between two interacting proteins can be estimated by combining the similarity scores of the GO terms associated with the proteins. Greater number of similar GO annotations between two proteins indicates greater interaction affinity. Existing semantic similarity measures make use of the GO graph structure, the information content of GO terms, or a combination of both. In this paper, we present a hybrid approach which utilizes both the topological features of the GO graph and information contents of the GO terms...
March 31, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Andres Iglesias, Akemi Galvez, Andreina Avila
Curve reconstruction from data points is an important issue for advanced medical imaging techniques, such as computer tomography (CT) and magnetic resonance imaging (MRI). The most powerful fitting functions for this purpose are the NURBS (non-uniform rational B-splines). Solving the general reconstruction problem with NURBS requires to compute all free variables of the problem (data parameters, breakpoints, control points and their weights). This leads to a very difficult non-convex, nonlinear, high-dimensional, multimodal, continuous optimization problem...
March 28, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Dan He, Zhanyong Wang, Laxmi Parida, Eleazar Eskin
Reconstruction of family trees, or pedigree reconstruction, for a group of individuals is a fundamental problem in genetics. The problem is known to be NP-hard even for datasets known to only contain siblings. Some recent methods have been developed to accurately and efficiently reconstruct pedigrees. These methods, however, still consider relatively simple pedigrees, for example, they are not able to handle half-sibling situations where a pair of individuals only share one parent. In this work, we propose an efficient method, IPED2, based on our previous work, which specifically targets reconstruction of complicated pedigrees that include half-siblings...
March 28, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Aurelie Pirayre, Camille Couprie, Laurent Duval, Jean-Christophe Pesquet
Discovering meaningful gene interactions is crucial for the identification of novel regulatory processes in cells. Building accurately the related graphs remains challenging due to the large number of possible solutions from available data. Nonetheless, enforcing a priori on the graph structure, such as modularity, may reduce network indeterminacy issues. BRANE Clust (Biologically-Related A priori Network Enhancement with Clustering) refines gene regulatory network (GRN) inference thanks to cluster information...
March 28, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Abolfazl Doostparast Torshizi, Linda Petzold
In many complex diseases, the transition process from healthy stage to catastrophic stage does not occur gradually. Recent studies indicate that the initiation and progression of such diseases are comprised of three steps including healthy stage, pre-disease stage, and disease stage. It has been demonstrated that a certain set of trajectories can be observed in the genetic signatures at the molecular level, which might be used to detect the pre-disease stage and to take necessary medical interventions. In this paper, we propose two optimization-based algorithms for extracting the dynamic network biomarkers responsible for catastrophic transition into the disease stage, and to open new horizons to reverse the disease progression at an early stage through pinpointing molecular signatures provided by high-throughput microarray data...
March 27, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Ying Li, Ye He, Siyu Han, Yanchun Liang
Gastric cancer is one of the top leading causes of cancer mortality worldwide especially in China. In recent years, some lncRNAs are discovered to be dysregulated in many cancers. The study on long non-coding RNAs (lncRNAs) relationship with cancers has attracted increasing attention. The molecular mechanism of gastric cancer remains largely unclear factors, especially for lncRNAs. Experiments are feasible to obtain related information, however, experimental identification of cancer-related lncRNAs usually possesses high time complexity and high cost...
March 24, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Yi-Nan Guo, Jian Cheng, Sha Luo, Dun-Wei Gong
For dynamic multi-objective vehicle routing problems, the waiting time of vehicle, the number of serving vehicles, the total distance of routes were normally considered as the optimization objectives. Except for above objectives, fuel consumption that leads to the environmental pollution and energy consumption was focused on in this paper. Considering the vehicles' load and the driving distance, corresponding carbon emission model was built and set as an optimization objective. Dynamic multi-objective vehicle routing problems with hard time windows and randomly appeared dynamic customers, subsequently, were modeled...
March 21, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Xiangtao Li, Ka-Chun Wong
In the past years, the high-throughput sequencing technologies have enabled massive insights into genomic annotations. In contrast, the full-scale three-dimensional arrangements of genomic regions are relatively unknown. Thanks to the recent breakthroughs in High-throughput Chromosome Conformation Capture (Hi-C) techniques, non-negative matrix factorization (NMF) has been adopted to identify local spatial clusters of genomic regions from Hi-C data. However, such non-negative matrix factorization entails a high-dimensional non-convex objective function to be optimized with non-negative constraints...
March 20, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Marco Frasca, Nicolo Cesa-Bianchi
Automated protein function prediction is a challenging problem with distinctive features, such as the hierarchical organization of protein functions and the scarcity of annotated proteins for most biological functions. We propose a multitask learning algorithm addressing both issues. Unlike standard multitask algorithms, which use task (protein functions) similarity information as a bias to speed up learning, we show that dissimilarity information enforces separation of rare class labels from frequent class labels, and for this reason is better suited for solving unbalanced protein function prediction problems...
March 17, 2017: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"