Current Protocols in Bioinformatics

P Romano, A Profumo, A Facchiano
Geena 2 is a tool for filtering, averaging, and aligning MALDI/TOF mass spectra, designed to assist scientists in the analysis of high volumes of data and support them for comparative studies. Three web interfaces are available with different levels of complexity. In this manuscript, we explain how to use Geena 2 with these three interfaces to perform analyses of one's own data. Two support protocols showing how to check the example input file and how to create an input file with own data are also presented...
November 13, 2018: Current Protocols in Bioinformatics
Wiebke Feindt, Sara J Oppenheim, Robert DeSalle, Shaadi Mehr
The analysis of transcriptome data from non-model organisms contributes to our understanding of diverse aspects of evolutionary biology, including developmental processes, speciation, adaptation, and extinction. Underlying this diversity is one shared feature, the generation of enormous amounts of sequence data. Data availability requirements in most journals oblige researchers to make their raw transcriptome data publicly available, and the databases housed at the National Center for Biotechnology Information (NCBI) are a popular choice for data deposition...
November 13, 2018: Current Protocols in Bioinformatics
Yanyan Ju, Jing Gong, Yucheng T Yang, Qiangfeng Cliff Zhang
RNA-RNA interactions (RRIs) are essential to understanding the regulatory mechanisms of RNAs. Mapping RRIs in vivo in a transcriptome-wide manner remained challenging until the recent development of several sequencing-based technologies. However, RRIs generated from large-scale studies had not been systematically collected and analyzed before. This article introduces RISE, a database of the RNA Interactome from Sequencing Experiments. RISE provides a comprehensive collection of RRIs in human, mouse, and yeast, derived from transcriptome-wide sequencing experiments, as well as targeted sequencing studies and other public databases/datasets...
November 8, 2018: Current Protocols in Bioinformatics
Tyler Alioto, Enrique Blanco, Genís Parra, Roderic Guigó
This unit describes the usage of geneid, an efficient gene-finding program that allows for the analysis of large genomic sequences, including whole mammalian chromosomes. These sequences can be partially annotated, and geneid can be used to refine this initial annotation. Training geneid is relatively easy, and parameter configurations exist for a number of eukaryotic species. geneid produces output in a variety of standard formats. The results, thus, can be processed by a variety of software tools, including visualization programs...
October 17, 2018: Current Protocols in Bioinformatics
Istvan Ladunga
The Basic Local Alignment Search Tool (BLAST) is the first resource to computationally characterize a novel amino acid or nucleic acid sequence. BLAST plays important roles in genomics, transcriptomics, and protein science. For numerous academic and commercial researchers, neither BLAST Web servers nor cloud resources satisfy the requirements of high-throughput comparative genomic pipelines or company policies. For such users, this unit describes how to install BLAST locally, either on a standalone workstation, or preferably on a compute cluster...
September 2018: Current Protocols in Bioinformatics
John Salamon, Ivan H Goenawan, David J Lynn
Biological processes are regulated at a cellular level by tightly controlled molecular interaction networks, which are collectively known as the interactome. The interactome is not a static entity, but instead is dynamically reorganized or "rewired" under varying temporal, spatial, and environmental conditions. Most network analysis and visualization tools have, to date, been developed for static representations of molecular interaction data. Here, we describe a protocol that provides a step-by-step guide to DyNet, a Cytoscape 3 application that facilitates the visualization and analysis of dynamic molecular interaction networks...
September 2018: Current Protocols in Bioinformatics
Kapeel M Chougule, Liya Wang, Joshua C Stein, Xiaofei Wang, Upendra Kumar Devisetty, Robert R Klein, Doreen Ware
RNA-seq is a vital method for understanding gene structure and expression patterns. Typical RNA-seq analysis protocols use sequencing reads of length 50 to 150 nucleotides for alignment to the reference genome and assembly of transcripts. The resultant transcripts are quantified and used for differential expression and visualization. Existing tools and protocols for RNA-seq are vast and diverse; given their differences in performance, it is critical to select an analysis protocol that is scalable, accurate, and easy to use...
September 2018: Current Protocols in Bioinformatics
Klemens Pichler, Kate Warner, Michele Magrane
Public availability of biological sequences is essential for their widespread access and use by the research community. The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and functional data. While most protein sequences entering UniProt are imported from other source databases containing nucleotide or 3-D structure data, protein sequences determined at the protein level can be submitted directly to UniProt. To this end, UniProt provides a Web interface called SPIN. This service enables researchers to make their de novo-sequenced proteins available to the scientific community and acquire UniProt accession numbers for use in publications...
June 2018: Current Protocols in Bioinformatics
Stephen J Kiniry, Audrey M Michel, Pavel V Baranov
GWIPS-viz is a publicly available browser that provides Genome Wide Information on Protein Synthesis through the visualization of ribosome profiling data. Ribosome profiling (Ribo-seq) is a high-throughput technique which isolates fragments of messenger RNA that are protected by the ribosome. The alignment of the ribosome-protected fragments or footprint sequences to the corresponding reference genome and their visualization using GWIPS-viz allows for unique insights into the genome loci that are expressed as potentially translated RNA...
June 2018: Current Protocols in Bioinformatics
Ioanna Kalvari, Eric P Nawrocki, Joanna Argasinska, Natalia Quinones-Olvera, Robert D Finn, Alex Bateman, Anton I Petrov
Rfam is a database of non-coding RNA families in which each family is represented by a multiple sequence alignment, a consensus secondary structure, and a covariance model. Using a combination of manual and literature-based curation and a custom software pipeline, Rfam converts descriptions of RNA families found in the scientific literature into computational models that can be used to annotate RNAs belonging to those families in any DNA or RNA sequence. Valuable research outputs that are often locked up in figures and supplementary information files are encapsulated in Rfam entries and made accessible through the Rfam Web site...
June 2018: Current Protocols in Bioinformatics
Camir Ricketts, Victoria Popic, Hosein Toosi, Iman Hajirasouliha
The reconstruction of cancer phylogeny trees and quantifying the evolution of the disease is a challenging task. LICHeE and BAMSE are two computational tools designed and implemented recently for this purpose. They both utilize estimated variant allele fraction of somatic mutations across multiple samples to infer the most likely cancer phylogenies. This unit provides extensive guidelines for installing and running both LICHeE and BAMSE. © 2018 by John Wiley & Sons, Inc.
June 2018: Current Protocols in Bioinformatics
Mais Ammari, Fiona McCarthy, Bindu Nanduri
An increasing proportion of curated host-pathogen interaction (HPI) information is becoming available in interaction databases. These data represent detailed, experimentally-verified, molecular interaction data, which may be used to better understand infectious diseases. By their very nature, HPIs are context dependent, where the outcome of two proteins as interacting or not depends on the precise biological conditions studied and approaches used for identifying these interactions. The associated biology and the technical details of the experiments identifying interacting protein molecules are increasing being curated using defined curation standards but are overlooked in current HPI network modeling...
March 2018: Current Protocols in Bioinformatics
Joanna L Sharman, Simon D Harding, Christopher Southan, Elena Faccenda, Adam J Pawson, Jamie A Davies
The IUPHAR/BPS Guide to PHARMACOLOGY is an expert-curated, open-access database of information on drug targets and the substances that act on them. This unit describes the procedures for searching and downloading ligand-target binding data and for finding detailed annotations and the most relevant literature. The database includes concise overviews of the properties of 1,700 data-supported human drug targets and related proteins, divided into families, and 9,000 small molecule and peptide experimental ligands and approved drugs that bind to those targets...
March 2018: Current Protocols in Bioinformatics
R Dustin Schaeffer, Yuxing Liao, Nick V Grishin
ECOD is a database of evolutionary domains from structures deposited in the PDB. Domains in ECOD are classified by a mixed manual/automatic method wherein the bulk of newly deposited structures are classified automatically by protein-protein BLAST. Those structures that cannot be classified automatically are referred to manual curators who use a combination of alignment results, functional analysis, and close reading of the literature to generate novel assignments. ECOD differs from other structural domain resources in that it is continually updated, classifying thousands of proteins per week...
March 2018: Current Protocols in Bioinformatics
Selim Kalayci, Zeynep H Gümüş
Biological networks are becoming increasingly large and complex, pushing the limits of existing 2D tools. iCAVE is an open-source software tool for interactive visual explorations of large and complex networks in 3D, stereoscopic 3D, or immersive 3D. It introduces new 3D network layout algorithms and 3D extensions of popular 2D network layout, clustering, and edge bundling algorithms to assist researchers in understanding the underlying patterns in large, multi-layered, clustered, or complex networks. This protocol aims to guide new users on the basic functions of iCAVE for loading data, laying out networks (single or multi-layered), bundling edges, clustering networks, visualizing clusters, visualizing data attributes, and saving output images or videos...
March 2018: Current Protocols in Bioinformatics
Heladia Salgado, Irma Martínez-Flores, Víctor H Bustamante, Kevin Alquicira-Hernández, Jair S García-Sotelo, Delfino García-Alonso, Julio Collado-Vides
In RegulonDB, for over 25 years, we have been gathering knowledge by manual curation from original scientific literature on the regulation of transcription initiation and genome organization in transcription units of the Escherichia coli K-12 genome. This unit describes six basic protocols that can serve as a guiding introduction to the main content of the current version (v9.4) of this electronic resource. These protocols include general navigation as well as searching for specific objects such as genes, gene products, transcription units, promoters, transcription factors, coexpression, and genetic sensory response units or GENSOR Units...
March 2018: Current Protocols in Bioinformatics
Tin Nguyen, Cristina Mitrea, Sorin Draghici
Identification of impacted pathways is an important problem because it allows us to gain insights into the underlying biology beyond the detection of differentially expressed genes. In the past decade, a plethora of methods have been developed for this purpose. The last generation of pathway analysis methods are designed to take into account various aspects of pathway topology in order to increase the accuracy of the findings. Here, we cover 34 such topology-based pathway analysis methods published in the past 13 years...
March 2018: Current Protocols in Bioinformatics
Andrea Rodriguez-Martinez, Rafael Ayala, Joram M Posma, Marc-Emmanuel Dumas
MetaboSignal is an R/Bioconductor package designed to explore the relationships between genes and metabolites, using the Kyoto Encyclopedia of Genes and Genomes (KEGG) as its primary database. It is a network-based approach that allows overlaying metabolic and signaling pathways and exploring the topological relationship between genes (signaling or metabolic genes) and metabolites. MetaboSignal is ideally suited to identify candidate genes in metabolome genome-wide association studies (mGWAS), particularly in the case of trans-acting associations...
March 2018: Current Protocols in Bioinformatics
Sanja Abbott, Andrii Iudin, Paul K Korir, Sriram Somasundharam, Ardan Patwardhan
The Electron Microscopy Data Bank (EMDB; is a global openly-accessible archive of biomolecular and cellular 3D reconstructions derived from electron microscopy (EM) data. EMBL-EBI develops web-based resources to facilitate the reuse of EMDB data. Here we provide protocols for how these resources can be used for searching EMDB, visualising EMDB structures, statistically analysing EMDB content and checking the validity of EMDB structures. Protocols for searching include quick link categories from the main page, links to latest entries released during the weekly cycle, filtered browsing of the entire archive and a form-based search...
March 2018: Current Protocols in Bioinformatics
Raunaq Malhotra, Isheeta Seth, Erik Lehnert, Jing Zhao, Gaurav Kaushik, Elizabeth H Williams, Anurag Sethi, Brandi N Davis-Dusenbery
Next-generation sequencing has produced petabytes of data, but accessing and analyzing these data remain challenging. Traditionally, researchers investigating public datasets like The Cancer Genome Atlas (TCGA) would download the data to a high-performance cluster, which could take several weeks even with a highly optimized network connection. The National Cancer Institute (NCI) initiated the Cancer Genomics Cloud Pilots program to provide researchers with the resources to process data with cloud computational resources...
December 8, 2017: Current Protocols in Bioinformatics
