Journal of Cheminformatics

Jakub Velkoborsky, David Hoksza
BACKGROUND: Visualization of large molecular datasets is a challenging yet important topic utilised in diverse fields of chemistry ranging from material engineering to drug design. Especially in drug design, modern methods of high-throughput screening generate large amounts of molecular data that call for methods enabling their analysis. One such method is classification of compounds based on their molecular scaffolds, a concept widely used by medicinal chemists to group molecules of similar properties...
2016
Muthukumarasamy Karthikeyan, Renu Vyas
Digital access to chemical journals resulted in a vast array of molecular information that is now available in the supplementary material files in PDF format. However, extracting this molecular information, generally from a PDF document format is a daunting task. Here we present an approach to harvest 3D molecular data from the supporting information of scientific research articles that are normally available from publisher's resources. In order to demonstrate the feasibility of extracting truly computable molecules from PDF file formats in a fast and efficient manner, we have developed a Java based application, namely ChemEngine...
2016
Saw Simeon, Watshara Shoombuatong, Nuttapat Anuwongcharoen, Likit Preeyanon, Virapong Prachayasittikul, Jarl E S Wikberg, Chanin Nantasenamat
BACKGROUND: Currently, monomeric fluorescent proteins (FP) are ideal markers for protein tagging. The prediction of oligomeric states is helpful for enhancing live biomedical imaging. Computational prediction of FP oligomeric states can accelerate the effort of protein engineering efforts of creating monomeric FPs. To the best of our knowledge, this study represents the first computational model for predicting and analyzing FP oligomerization directly from the amino acid sequence. RESULTS: After data curation, an exhaustive data set consisting of 397 non-redundant FP oligomeric states was compiled from the literature...
2016
Anastasia V Rudik, Alexander V Dmitriev, Alexey A Lagunin, Dmitry A Filimonov, Vladimir V Poroikov
BACKGROUND: The knowledge of drug metabolite structures is essential at the early stage of drug discovery to understand the potential liabilities and risks connected with biotransformation. The determination of the site of a molecule at which a particular metabolic reaction occurs could be used as a starting point for metabolite identification. The prediction of the site of metabolism does not always correspond to the particular atom that is modified by the enzyme but rather is often associated with a group of atoms...
2016
Samuel Lampa, Jonathan Alvarsson, Ola Spjuth
Predictive modelling in drug discovery is challenging to automate as it often contains multiple analysis steps and might involve cross-validation and parameter tuning that create complex dependencies between tasks. With large-scale data or when using computationally demanding modelling methods, e-infrastructures such as high-performance or cloud computing are required, adding to the existing challenges of fault-tolerant automation. Workflow management systems can aid in many of these challenges, but the currently available systems are lacking in the functionality needed to enable agile and flexible predictive modelling...
2016
Vincent F Scalfani, Antony J Williams, Valery Tkachenko, Karen Karapetyan, Alexey Pshenichnov, Robert M Hanson, Jahred M Liddie, Jason E Bara
BACKGROUND: Three-dimensional (3D) printed crystal structures are useful for chemistry teaching and research. Current manual methods of converting crystal structures into 3D printable files are time-consuming and tedious. To overcome this limitation, we developed a programmatic method that allows for facile conversion of thousands of crystal structures directly into 3D printable files. RESULTS: A collection of over 30,000 crystal structures in crystallographic information file (CIF) format from the Crystallography Open Database (COD) were programmatically converted into 3D printable files (VRML format) using Jmol scripting...
2016
Hugo López-Fernández, Gustavo de S Pessôa, Marco A Z Arruda, José L Capelo-Martínez, Florentino Fdez-Riverola, Daniel Glez-Peña, Miguel Reboiro-Jato
The spatial distribution of chemical elements in different types of samples is an important field in several research areas such as biology, paleontology or biomedicine, among others. Elemental distribution imaging by laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS) is an effective technique for qualitative and quantitative imaging due to its high spatial resolution and sensitivity. By applying this technique, vast amounts of raw data are generated to obtain high-quality images, essentially making the use of specific LA-ICP-MS imaging software that can process such data absolutely mandatory...
2016
Othman Soufan, Wail Ba-Alawi, Moataz Afeef, Magbubah Essack, Panos Kalnis, Vladimir B Bajic
BACKGROUND: Mining high-throughput screening (HTS) assays is key for enhancing decisions in the area of drug repositioning and drug discovery. However, many challenges are encountered in the process of developing suitable and accurate methods for extracting useful information from these assays. Virtual screening and a wide variety of databases, methods and solutions proposed to-date, did not completely overcome these challenges. This study is based on a multi-label classification (MLC) technique for modeling correlations between several HTS assays, meaning that a single prediction represents a subset of assigned correlated labels instead of one label...
2016
Mariana González-Medina, Fernando D Prieto-Martínez, John R Owen, José L Medina-Franco
BACKGROUND: Measuring the structural diversity of compound databases is relevant in drug discovery and many other areas of chemistry. Since molecular diversity depends on molecular representation, comprehensive chemoinformatic analysis of the diversity of libraries uses multiple criteria. For instance, the diversity of the molecular libraries is typically evaluated employing molecular scaffolds, structural fingerprints, and physicochemical properties. However, the assessment with each criterion is analyzed independently and it is not straightforward to provide an evaluation of the "global diversity"...
2016
Sunghwan Kim, Evan E Bolton, Stephen H Bryant
BACKGROUND: PubChem is a public repository for biological activities of small molecules. For the efficient use of its vast amount of chemical information, PubChem performs 2-dimensional (2-D) and 3-dimensional (3-D) neighborings, which precompute "neighbor" relationships between molecules in the PubChem Compound database, using the PubChem subgraph fingerprints-based 2-D similarity and the Gaussian-shape overlay-based 3-D similarity, respectively. These neighborings allow PubChem to provide the user with immediate access to the list of 2-D and 3-D neighbors (also called "Similar Compounds" and "Similar Conformers", respectively) for each compound in PubChem...
2016
Yannick Djoumbou Feunang, Roman Eisner, Craig Knox, Leonid Chepelev, Janna Hastings, Gareth Owen, Eoin Fahy, Christoph Steinbeck, Shankar Subramanian, Evan Bolton, Russell Greiner, David S Wishart
BACKGROUND: Scientists have long been driven by the desire to describe, organize, classify, and compare objects using taxonomies and/or ontologies. In contrast to biology, geology, and many other scientific disciplines, the world of chemistry still lacks a standardized chemical ontology or taxonomy. Several attempts at chemical classification have been made; but they have mostly been limited to either manual, or semi-automated proof-of-principle applications. This is regrettable as comprehensive chemical classification and description tools could not only improve our understanding of chemistry but also improve the linkage between chemistry and many other fields...
2016
Martin Gütlein, Stefan Kramer
BACKGROUND: Even though circular fingerprints have been first introduced more than 50 years ago, they are still widely used for building highly predictive, state-of-the-art (Q)SAR models. Historically, these structural fragments were designed to search large molecular databases. Hence, to derive a compact representation, circular fingerprint fragments are often folded to comparatively short bit-strings. However, folding fingerprints introduces bit collisions, and therefore adds noise to the encoded structural information and removes its interpretability...
2016
Maryam Habibi, David Luis Wiegandt, Florian Schmedding, Ulf Leser
Recently, methods for Chemical Named Entity Recognition (NER) have gained substantial interest, driven by the need for automatically analyzing todays ever growing collections of biomedical text. Chemical NER for patents is particularly essential due to the high economic importance of pharmaceutical findings. However, NER on patents has essentially been neglected by the research community for long, mostly because of the lack of enough annotated corpora. A recent international competition specifically targeted this task, but evaluated tools only on gold standard patent abstracts instead of full patents; furthermore, results from such competitions are often difficult to extrapolate to real-life settings due to the relatively high homogeneity of training and test data...
2016
Junaid Arshad, Alexander Hoffmann, Sandra Gesing, Richard Grunzke, Jens Krüger, Tamas Kiss, Sonja Herres-Pawlis, Gabor Terstyanszky
BACKGROUND: In Quantum Chemistry, many tasks are reoccurring frequently, e.g. geometry optimizations, benchmarking series etc. Here, workflows can help to reduce the time of manual job definition and output extraction. These workflows are executed on computing infrastructures and may require large computing and data resources. Scientific workflows hide these infrastructures and the resources needed to run them. It requires significant efforts and specific expertise to design, implement and test these workflows...
2016
Tomáš Raček, Jana Pazúriková, Radka Svobodová Vařeková, Stanislav Geidl, Aleš Křenek, Francesco Luca Falginella, Vladimír Horský, Václav Hejret, Jaroslav Koča
BACKGROUND: The concept of partial atomic charges was first applied in physical and organic chemistry and was later also adopted in computational chemistry, bioinformatics and chemoinformatics. The electronegativity equalization method (EEM) is the most frequently used approach for calculating partial atomic charges. EEM is fast and its accuracy is comparable to the quantum mechanical charge calculation method for which it was parameterized. Several EEM parameter sets for various types of molecules and QM charge calculation approaches have been published and new ones are still needed and produced...
2016
Ludovic Chaput, Juan Martinez-Sanz, Nicolas Saettel, Liliane Mouawad
BACKGROUND: In a structure-based virtual screening, the choice of the docking program is essential for the success of a hit identification. Benchmarks are meant to help in guiding this choice, especially when undertaken on a large variety of protein targets. Here, the performance of four popular virtual screening programs, Gold, Glide, Surflex and FlexX, is compared using the Directory of Useful Decoys-Enhanced database (DUD-E), which includes 102 targets with an average of 224 ligands per target and 50 decoys per ligand, generated to avoid biases in the benchmarking...
2016
Stuart J Chalk
BACKGROUND: A number of websites make available spectral data for download (typically as JCAMP-DX text files) and one (ChemSpider) that also allows users to contribute spectral files. As a result, searching and retrieving such spectral data can be time consuming, and difficult to reuse if the data is compressed in the JCAMP-DX file. What is needed is a single resource that allows submission of JCAMP-DX files, export of the raw data in multiple formats, searching based on multiple chemical identifiers, and is open in terms of license and access...
2016
Stuart J Chalk
With the move toward global, Internet enabled science there is an inherent need to capture, store, aggregate and search scientific data across a large corpus of heterogeneous data silos. As a result, standards development is needed to create an infrastructure capable of representing the diverse nature of scientific data. This paper describes a fundamental data model for scientific data that can be applied to data currently stored in any format, and an associated ontology that affords semantic representation of the structure of scientific data (and its metadata), upon which discipline specific semantics can be applied...
2016
Ola Spjuth, Patrik Rydberg, Egon L Willighagen, Chris T Evelo, Nina Jeliazkova
Xenobiotic metabolism is an active research topic but the limited amount of openly available high-quality biotransformation data constrains predictive modeling. Current database often default to commonly available information: which enzyme metabolizes a compound, but neither experimental conditions nor the atoms that undergo metabolization are captured. We present XMetDB, an open access database for drugs and other xenobiotics and their respective metabolites. The database contains chemical structures of xenobiotic biotransformations with substrate atoms annotated as reaction centra, the resulting product formed, and the catalyzing enzyme, type of experiment, and literature references...
2016
Athira Dilip, Samo Lešnik, Tanja Štular, Dušanka Janežič, Janez Konc
Ligand-based virtual screening of large small-molecule databases is an important step in the early stages of drug development. It is based on the similarity principle and is used to reduce the chemical space of large databases to a manageable size where chosen ligands can be experimentally tested. Ligand-based virtual screening can also be used to identify bioactive molecules with different basic scaffolds compared to already known bioactive molecules, thus having the potential to increase the structural variability of compounds...
2016
