Read by QxMD icon Read

Database: the Journal of Biological Databases and Curation

Qingyu Chen, Justin Zobel, Karin Verspoor
Duplication of information in databases is a major data quality challenge. The presence of duplicates, implying either redundancy or inconsistency, can have a range of impacts on the quality of analyses that use the data. To provide a sound basis for research on this issue in databases of nucleotide sequences, we have developed new, large-scale validated collections of duplicates, which can be used to test the effectiveness of duplicate detection methods. Previous collections were either designed primarily to test efficiency, or contained only a limited number of duplicates of limited kinds...
January 8, 2017: Database: the Journal of Biological Databases and Curation
Michał W Szcześniak, Michal Kabza, Justyna A Karolak, Malgorzata Rydzanicz, Dorota M Nowak, Barbara Ginter-Matuszewska, Piotr Polakowski, Rafal Ploski, Jacek P Szaflik, Marzena Gajecka
Keratoconus (KTCN, OMIM 148300) is a degenerative eye disorder characterized by progressive stromal thinning that leads to a conical shape of the cornea, resulting in optical aberrations and even loss of visual function. The biochemical background of the disease is poorly understood, which motivated us to perform RNA-Seq experiment, aimed at better characterizing the KTCN transcriptome and identification of long non-coding RNAs (lncRNAs) that might be involved in KTCN etiology. The in silico functional studies based on predicted lncRNA:RNA base-pairings led us to recognition of a number of lncRNAs possibly regulating genes with known or plausible links to KTCN...
2017: Database: the Journal of Biological Databases and Curation
Alexander Junge, Jan C Refsgaard, Christian Garde, Xiaoyong Pan, Alberto Santos, Ferhat Alkan, Christian Anthon, Christian von Mering, Christopher T Workman, Lars Juhl Jensen, Jan Gorodkin
Protein association networks can be inferred from a range of resources including experimental data, literature mining and computational predictions. These types of evidence are emerging for non-coding RNAs (ncRNAs) as well. However, integration of ncRNAs into protein association networks is challenging due to data heterogeneity. Here, we present a database of ncRNA-RNA and ncRNA-protein interactions and its integration with the STRING database of protein-protein interactions. These ncRNA associations cover four organisms and have been established from curated examples, experimental data, interaction predictions and automatic literature mining...
2017: Database: the Journal of Biological Databases and Curation
Ferhat Aydın, Zehra Melce Hüsünbeyi, Arzucan Özgür
Information regarding the physical interactions among proteins is crucial, since protein-protein interactions (PPIs) are central for many biological processes. The experimental techniques used to verify PPIs are vital for characterizing and assessing the reliability of the identified PPIs. A lot of information about PPIs and the experimental methods are only available in the text of the scientific publications that report them. In this study, we approach the problem of identifying passages with experimental methods for physical interactions between proteins as an information retrieval search task...
2017: Database: the Journal of Biological Databases and Curation
James C Wallace, Jesse A Port, Marissa N Smith, Elaine M Faustman
Antibiotic resistance (AR) is a major global public health threat but few resources exist that catalog AR genes outside of a clinical context. Current AR sequence databases are assembled almost exclusively from genomic sequences derived from clinical bacterial isolates and thus do not include many microbial sequences derived from environmental samples that confer resistance in functional metagenomic studies. These environmental metagenomic sequences often show little or no similarity to AR sequences from clinical isolates using standard classification criteria...
2017: Database: the Journal of Biological Databases and Curation
Qingyu Chen, Justin Zobel, Karin Verspoor
GenBank, the EMBL European Nucleotide Archive and the DNA DataBank of Japan, known collectively as the International Nucleotide Sequence Database Collaboration or INSDC, are the three most significant nucleotide sequence databases. Their records are derived from laboratory work undertaken by different individuals, by different teams, with a range of technologies and assumptions and over a period of decades. As a consequence, they contain a great many duplicates, redundancies and inconsistencies, but neither the prevalence nor the characteristics of various types of duplicates have been rigorously assessed...
2017: Database: the Journal of Biological Databases and Curation
Aitor Blanco-Míguez, Alberto Gutiérrez-Jácome, Florentino Fdez-Riverola, Anália Lourenço, Borja Sánchez
The Mechanism of Action of the Human Microbiome (MAHMI) database is a unique resource that provides comprehensive information about the sequence of potential immunomodulatory and antiproliferative peptides encrypted in the proteins produced by the human gut microbiota. Currently, MAHMI database contains over 300 hundred million peptide entries, with detailed information about peptide sequence, sources and potential bioactivity. The reference peptide data section is curated manually by domain experts. The in silico peptide data section is populated automatically through the systematic processing of publicly available exoproteomes of the human microbiome...
2017: Database: the Journal of Biological Databases and Curation
Connor Wytko, Brian Soto, Stephen P Ficklin
Galaxy is a popular framework for execution of complex analytical pipelines typically for large data sets, and is a commonly used for (but not limited to) genomic, genetic and related biological analysis. It provides a web front-end and integrates with high performance computing resources. Here we report the development of the blend4php library that wraps Galaxy's RESTful API into a PHP-based library. PHP-based web applications can use blend4php to automate execution, monitoring and management of a remote Galaxy server, including its users, workflows, jobs and more...
2017: Database: the Journal of Biological Databases and Curation
Rezarta Islamaj Doğan, Sun Kim, Andrew Chatr-Aryamontri, Christie S Chang, Rose Oughtred, Jennifer Rust, W John Wilbur, Donald C Comeau, Kara Dolinski, Mike Tyers
A great deal of information on the molecular genetics and biochemistry of model organisms has been reported in the scientific literature. However, this data is typically described in free text form and is not readily amenable to computational analyses. To this end, the BioGRID database systematically curates the biomedical literature for genetic and protein interaction data. This data is provided in a standardized computationally tractable format and includes structured annotation of experimental evidence. BioGRID curation necessarily involves substantial human effort by expert curators who must read each publication to extract the relevant information...
2017: Database: the Journal of Biological Databases and Curation
Mario Deng, Johannes Brägelmann, Ivan Kryukov, Nuno Saraiva-Agostinho, Sven Perner
With its Firebrowse service ( the Broad Institute is making large-scale multi-platform omics data analysis results publicly available through a Representational State Transfer (REST) Application Programmable Interface (API). Querying this database through an API client from an arbitrary programming environment is an essential task, allowing other developers and researchers to focus on their analysis and avoid data wrangling. Hence, as a first result, we developed a workflow to automatically generate, test and deploy such clients for rapid response to API changes...
2017: Database: the Journal of Biological Databases and Curation
Yu Rang Park, Jae-Jung Kim, Young Jo Yoon, Young-Kwang Yoon, Ha Yeong Koo, Young Mi Hong, Gi Young Jang, Soo-Yong Shin, Jong-Keuk Lee
Kawasaki disease (KD) is a rare disease that occurs predominantly in infants and young children. To identify KD susceptibility genes and to develop a diagnostic test, a specific therapy, or prevention method, collecting KD patients' clinical and genomic data is one of the major issues. For this purpose, Kawasaki Disease Database (KDD) was developed based on the efforts of Korean Kawasaki Disease Genetics Consortium (KKDGC). KDD is a collection of 1292 clinical data and genomic samples of 1283 patients from 13 KKDGC-participating hospitals...
July 2016: Database: the Journal of Biological Databases and Curation
Hoang-Quynh Le, Mai-Vu Tran, Thanh Hai Dang, Quang-Thuy Ha, Nigel Collier
The BioCreative V chemical-disease relation (CDR) track was proposed to accelerate the progress of text mining in facilitating integrative understanding of chemicals, diseases and their relations. In this article, we describe an extension of our system (namely UET-CAM) that participated in the BioCreative V CDR. The original UET-CAM system's performance was ranked fourth among 18 participating systems by the BioCreative CDR track committee. In the Disease Named Entity Recognition and Normalization (DNER) phase, our system employed joint inference (decoding) with a perceptron-based named entity recognizer (NER) and a back-off model with Semantic Supervised Indexing and Skip-gram for named entity normalization...
July 2016: Database: the Journal of Biological Databases and Curation
Olukemi O Ifeonu, Raphael Simon, Sharon M Tennant, Abhineet S Sheoran, Maria C Daly, Victor Felix, Jessica C Kissinger, Giovanni Widmer, Myron M Levine, Saul Tzipori, Joana C Silva
Human cryptosporidiosis, caused primarily by Cryptosporidium hominis and a subset of Cryptosporidium parvum, is a major cause of moderate-to-severe diarrhea in children under 5 years of age in developing countries and can lead to nutritional stunting and death. Cryptosporidiosis is particularly severe and potentially lethal in immunocompromised hosts. Biological and technical challenges have impeded traditional vaccinology approaches to identify novel targets for the development of vaccines against C. hominis, the predominant species associated with human disease...
2016: Database: the Journal of Biological Databases and Curation
Matej Stano, Gabor Beke, Lubos Klucar
Viruses are the most abundant biological entities and the reservoir of most of the genetic diversity in the Earth's biosphere. Viral genomes are very diverse, generally short in length and compared to other organisms carry only few genes. viruSITE is a novel database which brings together high-value information compiled from various resources. viruSITE covers the whole universe of viruses and focuses on viral genomes, genes and proteins. The database contains information on virus taxonomy, host range, genome features, sequential relatedness as well as the properties and functions of viral genes and proteins...
2016: Database: the Journal of Biological Databases and Curation
Ayush Singhal, Robert Leaman, Natalie Catlett, Thomas Lemberger, Johanna McEntyre, Shawn Polson, Ioannis Xenarios, Cecilia Arighi, Zhiyong Lu
Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs...
2016: Database: the Journal of Biological Databases and Curation
Yunfeng Qi, Dadong Wang, Daying Wang, Taicheng Jin, Liping Yang, Hui Wu, Yaoyao Li, Jing Zhao, Fengping Du, Mingxia Song, Renjun Wang
Epigenetic drugs are chemical compounds that target disordered post-translational modification of histone proteins and DNA through enzymes, and the recognition of these changes by adaptor proteins. Epigenetic drug-related experimental data such as gene expression probed by high-throughput sequencing, co-crystal structure probed by X-RAY diffraction and binding constants probed by bio-assay have become widely available. The mining and integration of multiple kinds of data can be beneficial to drug discovery and drug repurposing...
2016: Database: the Journal of Biological Databases and Curation
Robert P Guralnick, Paula F Zermoglio, John Wieczorek, Raphael LaFrance, David Bloom, Laura Russell
For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species databases, which only provide species-level, aggregated value ranges for traits of interest and often lack the direct observations on which those ranges are based...
2016: Database: the Journal of Biological Databases and Curation
Marc Feuermann, Pascale Gaudet, Huaiyu Mi, Suzanna E Lewis, Paul D Thomas
We previously reported a paradigm for large-scale phylogenomic analysis of gene families that takes advantage of the large corpus of experimentally supported Gene Ontology (GO) annotations. This 'GO Phylogenetic Annotation' approach integrates GO annotations from evolutionarily related genes across ∼100 different organisms in the context of a gene family tree, in which curators build an explicit model of the evolution of gene functions. GO Phylogenetic Annotation models the gain and loss of functions in a gene family tree, which is used to infer the functions of uncharacterized (or incompletely characterized) gene products, even for human proteins that are relatively well studied...
2016: Database: the Journal of Biological Databases and Curation
Allison Piovesan, Maria Caracausi, Francesca Antonaros, Maria Chiara Pelleri, Lorenza Vitale
We release GeneBase 1.1, a local tool with a graphical interface useful for parsing, structuring and indexing data from the National Center for Biotechnology Information (NCBI) Gene data bank. Compared to its predecessor GeneBase (1.0), GeneBase 1.1 now allows dynamic calculation and summarization in terms of median, mean, standard deviation and total for many quantitative parameters associated with genes, gene transcripts and gene features (exons, introns, coding sequences, untranslated regions). GeneBase 1...
2016: Database: the Journal of Biological Databases and Curation
Yi Jin Liew, Manuel Aranda, Christian R Voolstra
Over the last decade, technological advancements have substantially decreased the cost and time of obtaining large amounts of sequencing data. Paired with the exponentially increased computing power, individual labs are now able to sequence genomes or transcriptomes to investigate biological questions of interest. This has led to a significant increase in available sequence data. Although the bulk of data published in articles are stored in public sequence databases, very often, only raw sequencing data are available; miscellaneous data such as assembled transcriptomes, genome annotations etc...
2016: Database: the Journal of Biological Databases and Curation
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"