Read by QxMD icon Read


Chris-André Leimeister, Salma Sohrabi-Jahromi, Burkhard Morgenstern
MOTIVATION: Word-based or 'alignment-free' algorithms are increasingly used for phylogeny reconstruction and genome comparison, since they are much faster than traditional approaches that are based on full sequence alignments. Existing alignment-free programs, however, are less accurate than alignment-based methods. RESULTS: We propose Filtered Spaced Word Matches (FSWM), a fast alignment-free approach to estimate phylogenetic distances between large genomic sequences...
January 10, 2017: Bioinformatics
Valentin Wucher, Fabrice Legeai, Benoît Hédan, Guillaume Rizk, Lætitia Lagoutte, Tosso Leeb, Vidhya Jagannathan, Edouard Cadieu, Audrey David, Hannes Lohi, Susanna Cirera, Merete Fredholm, Nadine Botherel, Peter A J Leegwater, Céline Le Béguec, Hille Fieten, Jeremy Johnson, Jessica Alföldi, Catherine André, Kerstin Lindblad-Toh, Christophe Hitte, Thomas Derrien
Whole transcriptome sequencing (RNA-seq) has become a standard for cataloguing and monitoring RNA populations. One of the main bottlenecks, however, is to correctly identify the different classes of RNAs among the plethora of reconstructed transcripts, particularly those that will be translated (mRNAs) from the class of long non-coding RNAs (lncRNAs). Here, we present FEELnc (FlExible Extraction of LncRNAs), an alignment-free program that accurately annotates lncRNAs based on a Random Forest model trained with general features such as multi k-mer frequencies and relaxed open reading frames...
January 3, 2017: Nucleic Acids Research
Justine Rudewicz, Hayssam Soueidan, Raluca Uricaru, Hervé Bonnefoi, Richard Iggo, Jonas Bergh, Macha Nikolski
Targeted sequencing is commonly used in clinical application of NGS technology since it enables generation of sufficient sequencing depth in the targeted genes of interest and thus ensures the best possible downstream analysis. This notwithstanding, the accurate discovery and annotation of disease causing mutations remains a challenging problem even in such favorable context. The difficulty is particularly salient in the case of third generation sequencing technology, such as PacBio. We present MICADo, a de Bruijn graph based method, implemented in python, that makes possible to distinguish between patient specific mutations and other alterations for targeted sequencing of a cohort of patients...
2016: Frontiers in Genetics
Philipp Muller, Marc-Andre Begin, Thomas Schauer, Thomas Seel
Due to their relative ease of handling and low cost, inertial measurement unit (IMU)-based joint angle measurements are used for a widespread range of applications. These include sports performance, gait analysis and rehabilitation (e.g. Parkinson's disease monitoring or post-stroke assessment). However, a major downside of current algorithms, recomposing human kinematics from IMU data, is that they require calibration motions and/or the careful alignment of the IMUs with respect to the body segments. In this article, we propose a new method, which is alignment-free and self-calibrating using arbitrary movements of the user and an initial zero reference arm pose...
December 14, 2016: IEEE Journal of Biomedical and Health Informatics
Lianping Yang, Weilin Zhang
How we can describe the similarity relationship between the biological sequences is a basic but important problem in bioinformatics. The first graphical representation method for the similarity relationship rather than for single sequence is proposed in this article, which makes the similarity intuitional. Some properties such as sensitivity and continuity of the similarity are proved theoretically, which indicate that the similarity describer has the advantage of both alignment and alignment-free methods. With the aid of multiresolution analysis tools, we can exhibit the similarity's different profiles, from high resolution to low resolution...
December 19, 2016: Journal of Computational Biology: a Journal of Computational Molecular Cell Biology
Yongkun Li, Lily He, Rong Lucy He, Stephen S-T Yau
Zika virus (ZIKV) is a mosquito-borne flavivirus. It was first isolated from Uganda in 1947 and has become an emergent event since 2007. However, because of the inconsistency of alignment methods, the evolution of ZIKV remains poorly understood. In this study, we first use the complete protein and an alignment-free method to build a phylogenetic tree of 87 Zika strains in which Asian, East African, and West African lineages are characterized. We also use the NS5 protein to construct the genetic relationship among 44 Zika strains...
December 15, 2016: DNA and Cell Biology
Tsukasa Nakamura, Kentaro Tomii
Comprehensive analysis and comparison of protein ligand-binding pockets are important to predict the ligands which bind to parts of putative ligand binding pockets. Because of the recent increase of protein structure information, such analysis demands a fast and efficient method for comparing ligand binding pockets. Previously we proposed a fast alignment-free method based on a simple representation of a ligand binding pocket with one 11-dimensional vector, which is suitable for such analysis. Based on that method, we conducted this study to expand and revise similarity measures of binding pockets and to investigate the effects of those modifications with two datasets for improving the ability to detect similar binding pockets...
2016: Biophysics and Physicobiology
Yushuang Li, Tian Song, Jiasheng Yang, Yi Zhang, Jialiang Yang
In this paper, we have proposed a novel alignment-free method for comparing the similarity of protein sequences. We first encode a protein sequence into a 440 dimensional feature vector consisting of a 400 dimensional Pseudo-Markov transition probability vector among the 20 amino acids, a 20 dimensional content ratio vector, and a 20 dimensional position ratio vector of the amino acids in the sequence. By evaluating the Euclidean distances among the representing vectors, we compare the similarity of protein sequences...
2016: PloS One
Nathan A Ahlgren, Jie Ren, Yang Young Lu, Jed A Fuhrman, Fengzhu Sun
Viruses and their host genomes often share similar oligonucleotide frequency (ONF) patterns, which can be used to predict the host of a given virus by finding the host with the greatest ONF similarity. We comprehensively compared 11 ONF metrics using several k-mer lengths for predicting host taxonomy from among ∼32 000 prokaryotic genomes for 1427 virus isolate genomes whose true hosts are known. The background-subtracting measure [Formula: see text] at k = 6 gave the highest host prediction accuracy (33%, genus level) with reasonable computational times...
November 28, 2016: Nucleic Acids Research
Yasser B Ruiz-Blanco, Yovani Marrero-Ponce, Enrique García-Hernández, James Green
N-Glycosylation is a common post-translational modification that plays an important role in the proper folding and function of many proteins. This modification is largely dependent on the presence of a sequence motif called a "sequon" defined as Asn-Xxx-Ser/Thr. However, evidence has shown that the presence of such a "sequon" is insufficient to determine the occurrence of N-glycosylation with high precision. This study aims to elucidate patterns that can more accurately predict N-glycosylation sites in human proteins...
November 28, 2016: Amino Acids
Weinan Liao, Jie Ren, Kun Wang, Shun Wang, Feng Zeng, Ying Wang, Fengzhu Sun
The comparison between microbial sequencing data is critical to understand the dynamics of microbial communities. The alignment-based tools analyzing metagenomic datasets require reference sequences and read alignments. The available alignment-free dissimilarity approaches model the background sequences with Fixed Order Markov Chain (FOMC) yielding promising results for the comparison of microbial communities. However, in FOMC, the number of parameters grows exponentially with the increase of the order of Markov Chain (MC)...
November 23, 2016: Scientific Reports
Mirjana Domazet-Lošo, Tomislav Domazet-Lošo
Prokaryotic and viral genomes are often altered by recombination and horizontal gene transfer. The existing methods for detecting recombination are primarily aimed at viral genomes or sets of loci, since the expensive computation of underlying statistical models often hinders the comparison of complete prokaryotic genomes. As an alternative, alignment-free solutions are more efficient, but cannot map (align) a query to subject genomes. To address this problem, we have developed gmos (Genome MOsaic Structure), a new program that determines the mosaic structure of query genomes when compared to a set of closely related subject genomes...
2016: PloS One
David Pellow, Darya Filippova, Carl Kingsford
Using a sequence's k-mer content rather than the full sequence directly has enabled significant performance improvements in several sequencing applications, such as metagenomic species identification, estimation of transcript abundances, and alignment-free comparison of sequencing data. As k-mer sets often reach hundreds of millions of elements, traditional data structures are often impractical for k-mer set storage, and Bloom filters (BFs) and their variants are used instead. BFs reduce the memory footprint required to store millions of k-mers while allowing for fast set containment queries, at the cost of a low false positive rate (FPR)...
November 9, 2016: Journal of Computational Biology: a Journal of Computational Molecular Cell Biology
Lars Hahn, Chris-André Leimeister, Rachid Ounit, Stefano Lonardi, Burkhard Morgenstern
Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don't-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set...
October 2016: PLoS Computational Biology
Jie Zhou, Pianyu Zhong, Tinghui Zhang
Determination of sequence similarity is one of the major steps in computational phylogenetic studies. One of the major tasks of computational biologists is to develop novel mathematical descriptors for similarity analysis. DNA clustering is an important technology that automatically identifies inherent relationships among large-scale DNA sequences. The comparison between the DNA sequences of different species helps determine phylogenetic relationships among species. Alignment-free approaches have continuously gained interest in various sequence analysis applications such as phylogenetic inference and metagenomic classification/clustering, particularly for large-scale sequence datasets...
2016: Evolutionary Bioinformatics Online
Jian Zhao, Xiaofeng Song, Kai Wang
RNA-Seq based transcriptome assembly has been widely used to identify novel lncRNAs. However, the best-performing transcript reconstruction methods merely identified 21% of full-length protein-coding transcripts from H. sapiens. Those partial-length protein-coding transcripts are more likely to be classified as lncRNAs due to their incomplete CDS, leading to higher false positive rate for lncRNA identification. Furthermore, potential sequencing or assembly error that gain or abolish stop codons also complicates ORF-based prediction of lncRNAs...
October 6, 2016: Scientific Reports
Haohua Tu, Yuan Liu, Dmitry Turchinovich, Marina Marjanovic, Jens Lyngsø, Jesper Lægsgaard, Eric J Chaney, Youbo Zhao, Sixian You, William L Wilson, Bingwei Xu, Marcos Dantus, Stephen A Boppart
The preparation, staining, visualization, and interpretation of histological images of tissue is well-accepted as the gold standard process for the diagnosis of disease. These methods were developed historically, and are used ubiquitously in pathology, despite being highly time and labor intensive. Here we introduce a unique optical imaging platform and methodology for label-free multimodal multiphoton microscopy that uses a novel photonic crystal fiber source to generate tailored chemical contrast based on programmable supercontinuum pulses...
August 2016: Nature Photonics
Marcin Bogusz, Simon Whelan
Phylogenetic tree inference is a critical component of many systematic and evolutionary studies. The majority of these studies are based on the two-step process of multiple sequence alignment followed by tree inference, despite persistent evidence that the alignment step can lead to biased results. Here we present a two-part study that first presents PaHMM-Tree, a novel neighbor joining-based method that estimates pairwise distances without assuming a single alignment. We then use simulations to benchmark its performance against a wide-range of other phylogenetic tree inference methods, including the first comparison of alignment-free distance-based methods against more conventional tree estimation methods...
September 14, 2016: Systematic Biology
Anuj Gupta, I King Jordan, Lavanya Rishishwar
: Rapid and accurate identification of the sequence type (ST) of bacterial pathogens is critical for epidemiological surveillance and outbreak control. Cheaper and faster next-generation sequencing (NGS) technologies have taken preference over the traditional method of amplicon sequencing for multilocus sequence typing (MLST). But data generated by NGS platforms necessitate quality control, genome assembly and sequence similarity searching before an isolate's ST can be determined. These are computationally intensive and time consuming steps, which are not ideally suited for real-time molecular epidemiology...
January 1, 2017: Bioinformatics
Armen Abnousi, Shira L Broschat, Ananth Kalyanaraman
BACKGROUND: Identifying conserved regions in protein sequences is a fundamental operation, occurring in numerous sequence-driven analysis pipelines. It is used as a way to decode domain-rich regions within proteins, to compute protein clusters, to annotate sequence function, and to compute evolutionary relationships among protein sequences. A number of approaches exist for identifying and characterizing protein families based on their domains, and because domains represent conserved portions of a protein sequence, the primary computation involved in protein family characterization is identification of such conserved regions...
2016: PloS One
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"