Subodh Kumar Mishra, Neha Jain, Uma Shankar, Arpita Tawani, Amit Mishra, Amit Kumar
High-throughput screening and better understanding of small molecule's structure-activity relationship (SAR) using computational biology techniques have greatly expanded the face of drug discovery process in better discovery of therapeutics for various disease. Small Molecule Modulators Database (SMMDB) includes >1100 small molecules that have been either approved by US Food and Drug Administration, are under investigation or were rejected in clinical trial for any kind of neurological diseases. The comprehensive information about small molecules includes the details about their molecular targets (such as protein or enzyme, DNA, RNA, antisense RNA etc...
ZiaurRehman Tanoli, Zaid Alam, Markus Vähä-Koskela, Balaguru Ravikumar, Alina Malyutina, Alok Jaiswal, Jing Tang, Krister Wennerberg, Tero Aittokallio
Drug Target Commons (DTC) is a web platform (database with user interface) for community-driven bioactivity data integration and standardization for comprehensive mapping, reuse and analysis of compound-target interaction profiles. End users can search, upload, edit, annotate and export expert-curated bioactivity data for further analysis, using an application programmable interface, database dump or tab-delimited text download options. To guide chemical biology and drug-repurposing applications, DTC version 2...
Longfei Chen, Kun Lang, Shoudong Bi, Jiapeng Luo, Feiling Liu, Xinhai Ye, Jiadan Xu, Kang He, Fei Li, Gongyin Ye, Xuexin Chen
Insect pests reduce yield and cause economic losses, which are major problems in agriculture. Parasitic wasps are the natural enemies of many agricultural pests and thus have been widely used as biological control agents. Plants, phytophagous insects and parasitic wasps form a tritrophic food chain. Understanding the interactions in this tritrophic system should be helpful for developing parasitic wasps for pest control and deciphering the mechanisms of parasitism. However, the genomic resources for this tritrophic system are not well organized...
Xiao Wen, Lin Gao, Xingli Guo, Xing Li, Xiaotai Huang, Ying Wang, Haifu Xu, Ruijie He, Chenglong Jia, Feixiang Liang
While long non-coding RNAs (lncRNAs) may play important roles in cellular function and biological process, we still know little about them. Growing evidences indicate that subcellular localization of lncRNAs may provide clues to their functionality. To facilitate researchers functionally characterize thousands of lncRNAs, we developed a database-driven application, lncSLdb, which stores and manages user-collected qualitative and quantitative subcellular localization information of lncRNAs from literature mining...
Kevin K Le, Matthew D Whiteside, James E Hopkins, Victor P J Gannon, Chad R Laing
Public health laboratories are currently moving to whole-genome sequence (WGS)-based analyses, and require the rapid prediction of standard reference laboratory methods based solely on genomic data. Currently, these predictive genomics tasks rely on workflows that chain together multiple programs for the requisite analyses. While useful, these systems do not store the analyses in a genome-centric way, meaning the same analyses are often re-computed for the same genomes. To solve this problem, we created Spfy, a platform that rapidly performs the common reference laboratory tests, uses a graph database to store and retrieve the results from the computational workflows and links data to individual genomes using standardized ontologies...
Peter C Marks, Marc Bigler, Eric B Alsop, Adrien Vigneron, Bart P Lomans, Renato De Paula, Brett Geissler, Nicolas Tsesmetzis
The ever-increasing metagenomic data necessitate appropriate cataloguing in a way that facilitates the comparison and better contextualization of the underlying investigations. To this extent, information associated with the sequencing data as well as the original sample and the environment where it was obtained from is crucial. To date, there are not any publicly available repositories able to capture environmental metadata pertaining to hydrocarbon-rich environments. As such, contextualization and comparative analysis among sequencing datasets derived from these environments is to a certain degree hindered or cannot be fully evaluated...
Aaron M Cohen, Zackary O Dunivin, Neil R Smalheiser
The Medical Subject Heading 'Humans' is manually curated and indicates human-related studies within MEDLINE. However, newly published MEDLINE articles may take months to be indexed and non-MEDLINE articles lack consistent, transparent indexing of this feature. Therefore, for up to date and broad literature searches, there is a need for an independent automated system to identify whether a given publication is human-related, particularly when they lack Medical Subject Headings. One million MEDLINE records published in 1987-2014 were randomly selected...
Judith Mary Hariprakash, Shamsudheen Karuthedath Vellarikkal, Ankit Verma, Anop Singh Ranawat, Rijith Jayarajan, Rowmika Ravi, Anoop Kumar, Vishal Dixit, Ambily Sivadas, Atul Kumar Kashyap, Vigneshwar Senthivel, Paras Sehgal, Vijayalakshmi Mahadevan, Vinod Scaria, Sridhar Sivasubbu
South Asia is home to $\sim $20% of the world population and characterized by distinct ethnic, linguistic, cultural and genetic lineages. Only limited representative samples from the region have found its place in large population-scale international genome projects. The recent availability of genome scale data from multiple populations and datasets from South Asian countries in public domain motivated us to integrate the data into a comprehensive resource. In the present study, we have integrated a total of six datasets encompassing 1213 human exomes and genomes to create a compendium of 154 814 557 genetic variants and adding a total of 69 059 255 novel variants...
Aris Fergadis, Christos Baziotis, Dimitris Pappas, Haris Papageorgiou, Alexandros Potamianos
In this paper, we describe a hierarchical bi-directional attention-based Re-current Neural Network (RNN) as a reusable sequence encoder architecture, which is used as sentence and document encoder for document classification. The sequence encoder is composed of two bi-directional RNN equipped with an attention mechanism that identifies and captures the most important elements, words or sentences, in a document followed by a dense layer for the classification task. Our approach utilizes the hierarchical nature of documents which are composed of sequences of sentences and sentences are composed of sequences of words...
Daniel Longhi Fernandes Pedro, Alan Péricles Rodrigues Lorenzetti, Douglas Silva Domingues, Alexandre Rossi Paschoal
Transposable elements (TEs) play an essential role in the genetic variability of eukaryotic species. In plants, they may comprise up to 90% of the total genome. Non-coding RNAs (ncRNAs) are known to control gene expression and regulation. Although the relationship between ncRNAs and TEs is known, obtaining the organized data for sequenced genomes is not straightforward. In this study, we describe the PlaNC-TE (, a user-friendly portal harboring a knowledgebase created by integrating and analysing plant ncRNA-TE data...
Garima Singh, Basharat Bhat, M S K Jayadev, Ch Madhusudhan, Ashutosh Singh
Tropical calcific pancreatitis (TCP) is a juvenile, non-alcoholic form of chronic pancreatitis with its exclusive presence in tropical regions associated with the low economic status. TCP initiates in the childhood itself and then proliferates silently. mutTCPdb is a manually curated and comprehensive disease specific single nucleotide variant (SNV) database. Extensive search strategies were employed to create a repository while SNV information was collected from published articles. Several existing databases such as the dbSNP, Uniprot, miRTarBase2...
Wen-Jing Wang, Yu-Mei Wang, Yi Hu, Qin Lin, Rou Chen, Huan Liu, Wen-Ze Cao, Hui-Fang Zhu, Chang Tong, Li Li, Lu-Ying Peng
Heart diseases (HDs) represent a common group of diseases that involve the heart, a number of which are characterized by high morbidity and lethality. Recently, increasing evidence demonstrates diverse non-coding RNAs (ncRNAs) play critical roles in HDs. However, currently there lacks a systematic investigation of the association between HDs and ncRNAs. Here, we developed a Heart Disease-related Non-coding RNAs Database (HDncRNA), to curate the HDs-ncRNA associations from 3 different sources including 1904 published articles, 3 existing databases [the Human microRNA Disease Database (HMDD), miR2disease and lncRNAdisease] and 5 RNA-seq datasets...
Chun-Wei Tung, Shan-Shan Wang
Computational inference of affected functions, pathways and diseases for chemicals could largely accelerate the evaluation of potential effects of chemical exposure on human beings. Previously, we have developed a ChemDIS system utilizing information of interacting targets for chemical-disease inference. With the target information, testable hypotheses can be generated for experimental validation. In this work, we present an update of ChemDIS 2 system featured with more updated datasets and several new functions, including (i) custom enrichment analysis function for single omics data; (ii) multi-omics analysis function for joint analysis of multi-omics data; (iii) mixture analysis function for the identification of interaction and overall effects; (iv) web application programming interface (API) for programmed access to ChemDIS 2...
Yifan Peng, Anthony Rios, Ramakanth Kavuluru, Zhiyong Lu
Mining relations between chemicals and proteins from the biomedical literature is an increasingly important task. The CHEMPROT track at BioCreative VI aims to promote the development and evaluation of systems that can automatically detect the chemical-protein relations in running text (PubMed abstracts). This work describes our CHEMPROT track entry, which is an ensemble of three systems, including a support vector machine, a convolutional neural network, and a recurrent neural network. Their output is combined using majority voting or stacking for final predictions...
Fei Chen, Jiawei Zhang, Junhao Chen, Xiaojiang Li, Wei Dong, Jian Hu, Meigui Lin, Yanhui Liu, Guowei Li, Zhengjia Wang, Liangsheng Zhang
With over 6000 species in seven classes, red algae (Rhodophyta) have diverse economic, ecological, experimental and evolutionary values. However, red algae are usually absent or rare in comparative analyses because genomic information of this phylum is often under-represented in various comprehensive genome databases. To improve the accessibility to the ome data and omics tools for red algae, we provided 10 genomes and 27 transcriptomes representing all seven classes of Rhodophyta. Three genomes and 18 transcriptomes were de novo assembled and annotated in this project...
Lana Yeganova, Won Kim, Donald C Comeau, W John Wilbur, Zhiyong Lu
PubMed® is a search engine providing access to a collection of over 27 million biomedical bibliographic records as of 2017. PubMed processes millions of queries a day, and understanding these queries is one of the main building blocks for successful information retrieval. In this work, we present Field Sensor, a domain-specific tool for understanding the composition and predicting the user intent of PubMed queries. Given a query, the Field Sensor infers a field for each token or sequence of tokens in a query in multi-step process that includes syntactic chunking, rule-based tagging and probabilistic field prediction...
P Corbett, J Boyle
In this paper, we explore the application of artificial neural network ('deep learning') methods to the problem of detecting chemical-protein interactions in PubMed abstracts. We present here a system using multiple Long Short Term Memory layers to analyse candidate interactions, to determine whether there is a relation and which type. A particular feature of our system is the use of unlabelled data, both to pre-train word embeddings and also pre-train LSTM layers in the neural network. On the BioCreative VI CHEMPROT test corpus, our system achieves an F score of 61...
Wei-Sheng Wu, Yu-Xuan Jiang, Jer-Wei Chang, Yu-Han Chu, Yi-Hao Chiu, Yi-Hong Tsao, Torbjörn E M Nordling, Yan-Yuan Tseng, Joseph T Tseng
Translational regulation plays an important role in protein synthesis. Dysregulation of translation causes abnormal cell physiology and leads to diseases such as inflammatory disorders and cancers. An emerging technique, called ribosome profiling (ribo-seq), was developed to capture a snapshot of translation. It is based on deep sequencing of ribosome-protected mRNA fragments. A lot of ribo-seq data have been generated in various studies, so databases are needed for depositing and visualizing the published ribo-seq data...
Chen Li, Zhiqiang Rao, Qinghua Zheng, Xiangrong Zhang
Current research of bio-text mining mainly focuses on event extractions. Biological networks present much richer and meaningful information to biologists than events. Bio-entity coreference resolution (CR) is a very important method to complete a bio-event's attributes and interconnect events into bio-networks. Though general CR methods have been studies for a long time, they could not produce a practically useful result when applied to a special domain. Therefore, bio-entity CR needs attention to better assist biological network extraction...
Huiwei Zhou, Zhuang Liu, Shixian Ning, Yunlong Yang, Chengkun Lang, Yingyu Lin, Kun Ma
Automatically extracting protein-protein interactions (PPIs) from biomedical literature provides additional support for precision medicine efforts. This paper proposes a novel memory network-based model (MNM) for PPI extraction, which leverages prior knowledge about protein-protein pairs with memory networks. The proposed MNM captures important context clues related to knowledge representations learned from knowledge bases. Both entity embeddings and relation embeddings of prior knowledge are effective in improving the PPI extraction model, leading to a new state-of-the-art performance on the BioCreative VI PPI dataset...
