Read by QxMD icon Read


Lars Hahn, Chris-André Leimeister, Rachid Ounit, Stefano Lonardi, Burkhard Morgenstern
Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don't-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set...
October 2016: PLoS Computational Biology
Jie Zhou, Pianyu Zhong, Tinghui Zhang
Determination of sequence similarity is one of the major steps in computational phylogenetic studies. One of the major tasks of computational biologists is to develop novel mathematical descriptors for similarity analysis. DNA clustering is an important technology that automatically identifies inherent relationships among large-scale DNA sequences. The comparison between the DNA sequences of different species helps determine phylogenetic relationships among species. Alignment-free approaches have continuously gained interest in various sequence analysis applications such as phylogenetic inference and metagenomic classification/clustering, particularly for large-scale sequence datasets...
2016: Evolutionary Bioinformatics Online
Jian Zhao, Xiaofeng Song, Kai Wang
RNA-Seq based transcriptome assembly has been widely used to identify novel lncRNAs. However, the best-performing transcript reconstruction methods merely identified 21% of full-length protein-coding transcripts from H. sapiens. Those partial-length protein-coding transcripts are more likely to be classified as lncRNAs due to their incomplete CDS, leading to higher false positive rate for lncRNA identification. Furthermore, potential sequencing or assembly error that gain or abolish stop codons also complicates ORF-based prediction of lncRNAs...
October 6, 2016: Scientific Reports
Haohua Tu, Yuan Liu, Dmitry Turchinovich, Marina Marjanovic, Jens Lyngsø, Jesper Lægsgaard, Eric J Chaney, Youbo Zhao, Sixian You, William L Wilson, Bingwei Xu, Marcos Dantus, Stephen A Boppart
The preparation, staining, visualization, and interpretation of histological images of tissue is well-accepted as the gold standard process for the diagnosis of disease. These methods were developed historically, and are used ubiquitously in pathology, despite being highly time and labor intensive. Here we introduce a unique optical imaging platform and methodology for label-free multimodal multiphoton microscopy that uses a novel photonic crystal fiber source to generate tailored chemical contrast based on programmable supercontinuum pulses...
August 2016: Nature Photonics
Marcin Bogusz, Simon Whelan
Phylogenetic tree inference is a critical component of many systematic and evolutionary studies. The majority of these studies are based on the two-step process of multiple sequence alignment followed by tree inference, despite persistent evidence that the alignment step can lead to biased results. Here we present a two-part study that first presents PaHMM-Tree, a novel neighbor joining-based method that estimates pairwise distances without assuming a single alignment. We then use simulations to benchmark its performance against a wide-range of other phylogenetic tree inference methods, including the first comparison of alignment-free distance-based methods against more conventional tree estimation methods...
September 14, 2016: Systematic Biology
Anuj Gupta, I King Jordan, Lavanya Rishishwar
: Rapid and accurate identification of the sequence type (ST) of bacterial pathogens is critical for epidemiological surveillance and outbreak control. Cheaper and faster next-generation sequencing (NGS) technologies have taken preference over the traditional method of amplicon sequencing for multilocus sequence typing (MLST). But data generated by NGS platforms necessitate quality control, genome assembly and sequence similarity searching before an isolate's ST can be determined. These are computationally intensive and time consuming steps, which are not ideally suited for real-time molecular epidemiology...
September 7, 2016: Bioinformatics
Armen Abnousi, Shira L Broschat, Ananth Kalyanaraman
BACKGROUND: Identifying conserved regions in protein sequences is a fundamental operation, occurring in numerous sequence-driven analysis pipelines. It is used as a way to decode domain-rich regions within proteins, to compute protein clusters, to annotate sequence function, and to compute evolutionary relationships among protein sequences. A number of approaches exist for identifying and characterizing protein families based on their domains, and because domains represent conserved portions of a protein sequence, the primary computation involved in protein family characterization is identification of such conserved regions...
2016: PloS One
Anestis Gkanogiannis, Stéphane Gazut, Marcel Salanoubat, Sawsan Kanj, Thomas Brüls
BACKGROUND: Metagenomics holds great promises for deepening our knowledge of key bacterial driven processes, but metagenome assembly remains problematic, typically resulting in representation biases and discarding significant amounts of non-redundant sequence information. In order to alleviate constraints assembly can impose on downstream analyses, and/or to increase the fraction of raw reads assembled via targeted assemblies relying on pre-assembly binning steps, we developed a set of binning modules and evaluated their combination in a new "assembly-free" binning protocol...
2016: BMC Bioinformatics
Matteo Comin, Michele Schimd
BACKGROUND: Sequencing technologies are generating enormous amounts of read data, however assembly of genomes and metagenomes remain among the most challenging tasks. In this paper we study the comparison of genomes and metagenomes only based on read data, using word counts statistics called alignment-free thus not requiring reference genomes or assemblies. Quality scores produced by sequencing platforms are fundamental for various analyses, moreover future-generation sequencing platforms, will produce longer reads but with error rate around 15 %...
2016: BMC Medical Genomics
Luisa Delgado-Serrano, Silvia Restrepo, Jose Ricardo Bustos, Maria Mercedes Zambrano, Juan Manuel Anzola
BACKGROUND: The taxonomic and phylogenetic classification based on sequence analysis of the ITS1 genomic region has become a crucial component of fungal ecology and diversity studies. Nowadays, there is no accurate alignment-free classification tool for fungal ITS1 sequences for large environmental surveys. This study describes the development of a machine learning-based classifier for the taxonomical assignment of fungal ITS1 sequences at the genus level. RESULTS: A fungal ITS1 sequence database was built using curated data...
2016: BMC Research Notes
Saeid Amiri, Ivo D Dinov
Sequence comparison has become an essential tool in bioinformatics, because highly homologous sequences usually imply significant functional or structural similarity. Traditional sequence analysis techniques are based on preprocessing and alignment, which facilitate measuring and quantitative characterization of genetic differences, variability and complexity. However, recent developments of next generation and whole genome sequencing technologies give rise to new challenges that are related to measuring similarity and capturing rearrangements of large segments contained in the genome...
October 21, 2016: Journal of Theoretical Biology
Yingnan Cong, Yao-Ban Chan, Mark A Ragan
Lateral genetic transfer (LGT) plays an important role in the evolution of microbes. Existing computational methods for detecting genomic regions of putative lateral origin scale poorly to large data. Here, we propose a novel method based on TF-IDF (Term Frequency-Inverse Document Frequency) statistics to detect not only regions of lateral origin, but also their origin and direction of transfer, in sets of hierarchically structured nucleotide or protein sequences. This approach is based on the frequency distributions of k-mers in the sequences...
2016: Scientific Reports
Yingnan Cong, Yao-Ban Chan, Mark A Ragan
Many microbes can acquire genetic material from their environment and incorporate it into their genome, a process known as lateral genetic transfer (LGT). Computational approaches have been developed to detect genomic regions of lateral origin, but typically lack sensitivity, ability to distinguish donor from recipient, and scalability to very large datasets. To address these issues we have introduced an alignment-free method based on ideas from document analysis, term frequency-inverse document frequency (TF-IDF)...
2016: Scientific Reports
Ying Wang, Haiyan Hu, Xiaoman Li
Metagenomics is a next-generation omics field currently impacting postgenomic life sciences and medicine. Binning metagenomic reads is essential for the understanding of microbial function, compositions, and interactions in given environments. Despite the existence of dozens of computational methods for metagenomic read binning, it is still very challenging to bin reads. This is especially true for reads from unknown species, from species with similar abundance, and/or from low-abundance species in environmental samples...
August 2016: Omics: a Journal of Integrative Biology
Troy Hernandez, Jie Yang
The typical process for classifying and submitting a newly sequenced virus to the NCBI database involves two steps. First, a BLAST search is performed to determine likely family candidates. That is followed by checking the candidate families with the pairwise sequence alignment tool for similar species. The submitter's judgment is then used to determine the most likely species classification. The aim of this article is to show that this process can be automated into a fast, accurate, one-step process using the proposed alignment-free method and properly implemented machine learning techniques...
October 2016: Journal of Computational Biology: a Journal of Computational Molecular Cell Biology
Damon H May, Emma Timmins-Schiffman, Molly P Mikan, H Rodger Harvey, Elhanan Borenstein, Brook L Nunn, William S Noble
In principle, tandem mass spectrometry can be used to detect and quantify the peptides present in a microbiome sample, enabling functional and taxonomic insight into microbiome metabolic activity. However, the phylogenetic diversity constituting a particular microbiome is often unknown, and many of the organisms present may not have assembled genomes. In ocean microbiome samples, with particularly diverse and uncultured bacterial communities, it is difficult to construct protein databases that contain the bulk of the peptides in the sample without losing detection sensitivity due to the overwhelming number of candidate peptides for each tandem mass spectrum...
August 5, 2016: Journal of Proteome Research
Mitchell J Brittnacher, Sonya L Heltshe, Hillary S Hayden, Matthew C Radey, Eli J Weiss, Christopher J Damman, Timothy L Zisman, David L Suskind, Samuel I Miller
BACKGROUND: Comparative analysis of gut microbiomes in clinical studies of human diseases typically rely on identification and quantification of species or genes. In addition to exploring specific functional characteristics of the microbiome and potential significance of species diversity or expansion, microbiome similarity is also calculated to study change in response to therapies directed at altering the microbiome. Established ecological measures of similarity can be constructed from species abundances, however methods for calculating these commonly used ecological measures of similarity directly from whole genome shotgun (WGS) metagenomic sequence are lacking...
2016: PloS One
Waqar Ali, Anatol E Wegner, Robert E Gaunt, Charlotte M Deane, Gesine Reinert
Networks are routinely used to represent large data sets, making the comparison of networks a tantalizing research question in many areas. Techniques for such analysis vary from simply comparing network summary statistics to sophisticated but computationally expensive alignment-based approaches. Most existing methods either do not generalize well to different types of networks or do not provide a quantitative similarity score between networks. In contrast, alignment-free topology based network similarity scores empower us to analyse large sets of networks containing different types and sizes of data...
2016: Scientific Reports
Hsin-Hsiung Huang, Chenglong Yu
The alignment-free n-gram based method with the out-of-place measures as the distance has been successfully applied to automatic text or natural languages categorization in real time. However, it is not clear about its performance and the selection of n for comparing genome sequences. Here we propose a symmetric version of the out-of-place measure and a new approach for finding the optimal range of n to construct a phylogenetic tree with the symmetric out-of-place measures. Our method is then applied to real genome sequence datasets...
October 7, 2016: Journal of Theoretical Biology
Guillaume Bernard, Cheong Xin Chan, Mark A Ragan
Alignment-free (AF) approaches have recently been highlighted as alternatives to methods based on multiple sequence alignment in phylogenetic inference. However, the sensitivity of AF methods to genome-scale evolutionary scenarios is little known. Here, using simulated microbial genome data we systematically assess the sensitivity of nine AF methods to three important evolutionary scenarios: sequence divergence, lateral genetic transfer (LGT) and genome rearrangement. Among these, AF methods are most sensitive to the extent of sequence divergence, less sensitive to low and moderate frequencies of LGT, and most robust against genome rearrangement...
2016: Scientific Reports
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"