Read by QxMD icon Read

Database: the Journal of Biological Databases and Curation

Qingyu Chen, Justin Zobel, Karin Verspoor
Duplication of information in databases is a major data quality challenge. The presence of duplicates, implying either redundancy or inconsistency, can have a range of impacts on the quality of analyses that use the data. To provide a sound basis for research on this issue in databases of nucleotide sequences, we have developed new, large-scale validated collections of duplicates, which can be used to test the effectiveness of duplicate detection methods. Previous collections were either designed primarily to test efficiency, or contained only a limited number of duplicates of limited kinds...
January 8, 2017: Database: the Journal of Biological Databases and Curation
Steven T Hill, Ramcharan Sudarsanam, John Henning, David Hendrix
Hop (Humulus lupulus L. var lupulus) is a dioecious plant of worldwide significance, used primarily for bittering and flavoring in brewing beer. Studies on the medicinal properties of several unique compounds produced by hop have led to additional interest from pharmacy and healthcare industries as well as livestock production as a natural antibiotic. Genomic research in hop has resulted a published draft genome and transcriptome assemblies. As research into the genomics of hop has gained interest, there is a critical need for centralized online genomic resources...
January 1, 2017: Database: the Journal of Biological Databases and Curation
Mathurin Dorel, Eric Viara, Emmanuel Barillot, Andrei Zinovyev, Inna Kuperstein
Human diseases such as cancer are routinely characterized by high-throughput molecular technologies, and multi-level omics data are accumulated in public databases at increasing rate. Retrieval and visualization of these data in the context of molecular network maps can provide insights into the pattern of regulation of molecular functions reflected by an omics profile. In order to make this task easy, we developed NaviCom, a Python package and web platform for visualization of multi-level omics data on top of biological network maps...
January 1, 2017: Database: the Journal of Biological Databases and Curation
Jinghang Gu, Fuqing Sun, Longhua Qian, Guodong Zhou
This article describes our work on the BioCreative-V chemical-disease relation (CDR) extraction task, which employed a maximum entropy (ME) model and a convolutional neural network model for relation extraction at inter- and intra-sentence level, respectively. In our work, relation extraction between entity concepts in documents was simplified to relation extraction between entity mentions. We first constructed pairs of chemical and disease mentions as relation instances for training and testing stages, then we trained and applied the ME model and the convolutional neural network model for inter- and intra-sentence level, respectively...
January 1, 2017: Database: the Journal of Biological Databases and Curation
Honghan Wu, Anika Oellrich, Christine Girges, Bernard de Bono, Tim J P Hubbard, Richard J B Dobson
Neurodegenerative disorders such as Parkinson's and Alzheimer's disease are devastating and costly illnesses, a source of major global burden. In order to provide successful interventions for patients and reduce costs, both causes and pathological processes need to be understood. The ApiNATOMY project aims to contribute to our understanding of neurodegenerative disorders by manually curating and abstracting data from the vast body of literature amassed on these illnesses. As curation is labour-intensive, we aimed to speed up the process by automatically highlighting those parts of the PDF document of primary importance to the curator...
January 1, 2017: Database: the Journal of Biological Databases and Curation
Tim E Putman, Sebastien Lelong, Sebastian Burgstaller-Muehlbacher, Andra Waagmeester, Colin Diesh, Nathan Dunn, Monica Munoz-Torres, Gregory S Stupp, Chunlei Wu, Andrew I Su, Benjamin M Good
With the advancement of genome-sequencing technologies, new genomes are being sequenced daily. Although these sequences are deposited in publicly available data warehouses, their functional and genomic annotations (beyond genes which are predicted automatically) mostly reside in the text of primary publications. Professional curators are hard at work extracting those annotations from the literature for the most studied organisms and depositing them in structured databases. However, the resources don't exist to fund the comprehensive curation of the thousands of newly sequenced organisms in this manner...
January 1, 2017: Database: the Journal of Biological Databases and Curation
Guillaume Postic, Yassine Ghouzam, Catherine Etchebest, Jean-Christophe Gelly
Knowing the position of protein structures within the membrane is crucial for fundamental and applied research in the field of molecular biology. Only few web resources propose coordinate files of oriented transmembrane proteins, and these exclude predicted structures, although they represent the largest part of the available models. In this article, we present TMPL (, a database of transmembrane protein structures (α-helical and β-sheet) positioned in the lipid bilayer. It is the first database to include theoretical models of transmembrane protein structures, making it a large repository with more than 11 000 entries...
January 1, 2017: Database: the Journal of Biological Databases and Curation
Xiangying Jiang, Martin Ringwald, Judith Blake, Hagit Shatkay
The Gene Expression Database (GXD) is a comprehensive online database within the Mouse Genome Informatics resource, aiming to provide available information about endogenous gene expression during mouse development. The information stems primarily from many thousands of biomedical publications that database curators must go through and read. Given the very large number of biomedical papers published each year, automatic document classification plays an important role in biomedical research. Specifically, an effective and efficient document classifier is needed for supporting the GXD annotation workflow...
January 1, 2017: Database: the Journal of Biological Databases and Curation
Zhenzhen Xu, Jing Liu, Wanchao Ni, Zhen Peng, Yue Guo, Wuwei Ye, Fang Huang, Xianggui Zhang, Peng Xu, Qi Guo, Xinlian Shen, Jianchang Du
Although several diploid and tetroploid Gossypium species genomes have been sequenced, the well annotated web-based transposable elements (TEs) database is lacking. To better understand the roles of TEs in structural, functional and evolutionary dynamics of the cotton genome, a comprehensive, specific, and user-friendly web-based database, Gossypium raimondii transposable elements database (GrTEdb), was constructed. A total of 14 332 TEs were structurally annotated and clearly categorized in G. raimondii genome, and these elements have been classified into seven distinct superfamilies based on the order of protein-coding domains, structures and/or sequence similarity, including 2929 Copia-like elements, 10 368 Gypsy-like elements, 299 L1 , 12 Mutators , 435 PIF-Harbingers , 275 CACTAs and 14 Helitrons ...
January 1, 2017: Database: the Journal of Biological Databases and Curation
Hedvig Tordai, Kristóf Jakab, Gergely Gyimesi, Kinga András, Anna Brózik, Balázs Sarkadi, Tamás Hegedus
ABC (ATP-Binding Cassette) proteins with altered function are responsible for numerous human diseases. To aid the selection of positions and amino acids for ABC structure/function studies we have generated a database, ABCMdb (Gyimesi et al. , ABCMdb: a database for the comparative analysis of protein mutations in ABC transporters, and a potential framework for a general application. Hum Mutat 2012; 33:1547-1556.), with interactive tools. The database has been populated with mentions of mutations extracted from full text papers, alignments and structural models...
January 1, 2017: Database: the Journal of Biological Databases and Curation
Mohamed Reda Bouadjenek, Karin Verspoor, Justin Zobel
Bioinformatics sequence databases such as Genbank or UniProt contain hundreds of millions of records of genomic data. These records are derived from direct submissions from individual laboratories, as well as from bulk submissions from large-scale sequencing centres; their diversity and scale means that they suffer from a range of data quality issues including errors, discrepancies, redundancies, ambiguities, incompleteness and inconsistencies with the published literature. In this work, we seek to investigate and analyze the data quality of sequence databases from the perspective of a curator, who must detect anomalous and suspicious records...
January 1, 2017: Database: the Journal of Biological Databases and Curation
Magali Ruffier, Andreas Kähäri, Monika Komorowska, Stephen Keenan, Matthew Laird, Ian Longden, Glenn Proctor, Steve Searle, Daniel Staines, Kieron Taylor, Alessandro Vullo, Andrew Yates, Daniel Zerbino, Paul Flicek
The Ensembl software resources are a stable infrastructure to store, access and manipulate genome assemblies and their functional annotations. The Ensembl 'Core' database and Application Programming Interface (API) was our first major piece of software infrastructure and remains at the centre of all of our genome resources. Since its initial design more than fifteen years ago, the number of publicly available genomic, transcriptomic and proteomic datasets has grown enormously, accelerated by continuous advances in DNA-sequencing technology...
January 1, 2017: Database: the Journal of Biological Databases and Curation
L Suhrbier, W-H Kusber, O Tschöpe, A Güntsch, W G Berendsohn
Biological research collections holding billions of specimens world-wide provide the most important baseline information for systematic biodiversity research. Increasingly, specimen data records become available in virtual herbaria and data portals. The traditional (physical) annotation procedure fails here, so that an important pathway of research documentation and data quality control is broken. In order to create an online annotation system, we analysed, modeled and adapted traditional specimen annotation workflows...
January 1, 2017: Database: the Journal of Biological Databases and Curation
Simon Jon McIlroy, Rasmus Hansen Kirkegaard, Bianca McIlroy, Marta Nierychlo, Jannie Munk Kristensen, Søren Michael Karst, Mads Albertsen, Per Halkjær Nielsen
Wastewater is increasingly viewed as a resource, with anaerobic digester technology being routinely implemented for biogas production. Characterising the microbial communities involved in wastewater treatment facilities and their anaerobic digesters is considered key to their optimal design and operation. Amplicon sequencing of the 16S rRNA gene allows high-throughput monitoring of these systems. The MiDAS field guide is a public resource providing amplicon sequencing protocols and an ecosystem-specific taxonomic database optimized for use with wastewater treatment facility samples...
January 1, 2017: Database: the Journal of Biological Databases and Curation
Sankha Subhra Das, Mithun James, Sandip Paul, Nishant Chakravorty
The various pathophysiological processes occurring in living systems are known to be orchestrated by delicate interplays and cross-talks between different genes and their regulators. Among the various regulators of genes, there is a class of small non-coding RNA molecules known as microRNAs. Although, the relative simplicity of miRNAs and their ability to modulate cellular processes make them attractive therapeutic candidates, their presence in large numbers make it challenging for experimental researchers to interpret the intricacies of the molecular processes they regulate...
January 1, 2017: Database: the Journal of Biological Databases and Curation
Randi Vita, James A Overton, Alessandro Sette, Bjoern Peters
The Immune Epitope Database (IEDB) project incorporates independently developed ontologies and controlled vocabularies into its curation and search interface. This simplifies curation practices, improves the user query experience and facilitates interoperability between the IEDB and other resources. While the use of independently developed ontologies has long been recommended as a best practice, there continues to be a significant number of projects that develop their own vocabularies instead, or that do not fully utilize the power of ontologies that they are using...
January 1, 2017: Database: the Journal of Biological Databases and Curation
Fabio Rinaldi, Oscar Lithgow, Socorro Gama-Castro, Hilda Solano, Alejandra Lopez, Luis José Muñiz Rascado, Cecilia Ishida-Gutiérrez, Carlos-Francisco Méndez-Cruz, Julio Collado-Vides
Experimentally generated biological information needs to be organized and structured in order to become meaningful knowledge. However, the rate at which new information is being published makes manual curation increasingly unable to cope. Devising new curation strategies that leverage upon data mining and text analysis is, therefore, a promising avenue to help life science databases to cope with the deluge of novel information. In this article, we describe the integration of text mining technologies in the curation pipeline of the RegulonDB database, and discuss how the process can enhance the productivity of the curators...
January 1, 2017: Database: the Journal of Biological Databases and Curation
Gemma L Holliday, Shoshana D Brown, Eyal Akiva, David Mischel, Michael A Hicks, John H Morris, Conrad C Huang, Elaine C Meng, Scott C-H Pegg, Thomas E Ferrin, Patricia C Babbitt
With ever-increasing amounts of sequence data available in both the primary literature and sequence repositories, there is a bottleneck in annotating molecular function to a sequence. This article describes the biocuration process and methods used in the structure-function linkage database (SFLD) to help address some of the challenges. We discuss how the hierarchy within the SFLD allows us to infer detailed functional properties for functionally diverse enzyme superfamilies in which all members are homologous, conserve an aspect of their chemical function and have associated conserved structural features that enable the chemistry...
January 1, 2017: Database: the Journal of Biological Databases and Curation
João Carneiro, Adriana Resende, Filipe Pereira
The human immunodeficiency virus (HIV) is associated with one of the most widespread infectious disease, the acquired immunodeficiency syndrome (AIDS). The development of antiretroviral drugs and methods for virus detection requires a comprehensive analysis of the HIV genomic diversity, particularly in the binding sites of oligonucleotides. Here, we describe a versatile online database (HIVoligoDB) with oligonucleotides selected for the diagnosis of HIV and treatment of AIDS. Currently, the database provides an interface for visualization, analysis and download of 380 HIV-1 and 65 HIV-2 oligonucleotides annotated according to curated reference genomes...
January 1, 2017: Database: the Journal of Biological Databases and Curation
Chris Armit, Bill Hill, S Venkataraman, Kenneth McLeod, Albert Burger, Richard Baldock
A primary objective of the eMouseAtlas Project is to enable 3D spatial mapping of whole embryo gene expression data to capture complex 3D patterns for indexing, visualization, cross-comparison and analysis. For this we have developed a spatio-temporal framework based on 3D models of embryos at different stages of development coupled with an anatomical ontology. Here we introduce a method of defining coordinate axes that correspond to the anatomical or biologically relevant anterior-posterior (A-P), dorsal-ventral (D-V) and left-right (L-R) directions...
January 1, 2017: Database: the Journal of Biological Databases and Curation
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"