Read by QxMD icon Read

BMC Bioinformatics

Raphael Couronné, Philipp Probst, Anne-Laure Boulesteix
BACKGROUND AND GOAL: The Random Forest (RF) algorithm for regression and classification has considerably gained popularity since its introduction in 2001. Meanwhile, it has grown to a standard classification approach competing with logistic regression in many innovation-friendly scientific fields. RESULTS: In this context, we present a large scale benchmarking experiment based on 243 real datasets comparing the prediction performance of the original version of RF with default parameters and LR as binary classification tools...
July 17, 2018: BMC Bioinformatics
Saurabh Baheti, Xiaojia Tang, Daniel R O'Brien, Nicholas Chia, Lewis R Roberts, Heidi Nelson, Judy C Boughey, Liewei Wang, Matthew P Goetz, Jean-Pierre A Kocher, Krishna R Kalari
BACKGROUND: Transfer of genetic material from microbes or viruses into the host genome is known as horizontal gene transfer (HGT). The integration of viruses into the human genome is associated with multiple cancers, and these can now be detected using next-generation sequencing methods such as whole genome sequencing and RNA-sequencing. RESULTS: We designed a novel computational workflow, HGT-ID, to identify the integration of viruses into the human genome using the sequencing data...
July 17, 2018: BMC Bioinformatics
Syed Ahmad Chan Bukhari, Marcos Martínez-Romero, Martin J O' Connor, Attila L Egyedi, Debra Willrett, John Graybeal, Mark A Musen, Kei-Hoi Cheung, Steven H Kleinstein
BACKGROUND: Public biomedical data repositories often provide web-based interfaces to collect experimental metadata. However, these interfaces typically reflect the ad hoc metadata specification practices of the associated repositories, leading to a lack of standardization in the collected metadata. This lack of standardization limits the ability of the source datasets to be broadly discovered, reused, and integrated with other datasets. To increase reuse, discoverability, and reproducibility of the described experiments, datasets should be appropriately annotated by using agreed-upon terms, ideally from ontologies or other controlled term sources...
July 16, 2018: BMC Bioinformatics
Abhik Seal, David J Wild
BACKGROUND: Netpredictor is an R package for prediction of missing links in any given unipartite or bipartite network. The package provides utilities to compute missing links in a bipartite and well as unipartite networks using Random Walk with Restart and Network inference algorithm and a combination of both. The package also allows computation of Bipartite network properties, visualization of communities for two different sets of nodes, and calculation of significant interactions between two sets of nodes using permutation based testing...
July 16, 2018: BMC Bioinformatics
Devika Ganesamoorthy, Minh Duc Cao, Tania Duarte, Wenhan Chen, Lachlan Coin
BACKGROUND: Tandem repeats comprise significant proportion of the human genome including coding and regulatory regions. They are highly prone to repeat number variation and nucleotide mutation due to their repetitive and unstable nature, making them a major source of genomic variation between individuals. Despite recent advances in high throughput sequencing, analysis of tandem repeats in the context of complex diseases is still hindered by technical limitations. We report a novel targeted sequencing approach, which allows simultaneous analysis of hundreds of repeats...
July 16, 2018: BMC Bioinformatics
Yasunobu Okamura, Kengo Kinoshita
BACKGROUND: Data generated by RNA sequencing (RNA-Seq) is now accumulating in vast amounts in public repositories, especially for human and mouse genomes. Reanalyzing these data has emerged as a promising approach to identify gene modules or pathways. Although meta-analyses of gene expression data are frequently performed using microarray data, meta-analyses using RNA-Seq data are still rare. This lag is partly due to the limitations in reanalyzing RNA-Seq data, which requires extensive computational resources...
July 16, 2018: BMC Bioinformatics
Lana Yeganova, Sun Kim, Grigory Balasanov, W John Wilbur
BACKGROUND: The need to organize any large document collection in a manner that facilitates human comprehension has become crucial with the increasing volume of information available. Two common approaches to provide a broad overview of the information space are document clustering and topic modeling. Clustering aims to group documents or terms into meaningful clusters. Topic modeling, on the other hand, focuses on finding coherent keywords for describing topics appearing in a set of documents...
July 16, 2018: BMC Bioinformatics
Ryohei Eguchi, Mohammand Bozlul Karim, Pingzhao Hu, Tetsuo Sato, Naoaki Ono, Shigehiko Kanaya, Md Altaf-Ul-Amin
BACKGROUND: There are different and complicated associations between genes and diseases. Finding the causal associations between genes and specific diseases is still challenging. In this work we present a method to predict novel associations of genes and pathways with inflammatory bowel disease (IBD) by integrating information of differential gene expression, protein-protein interaction and known disease genes related to IBD. RESULTS: We downloaded IBD gene expression data from NCBI's Gene Expression Omnibus, performed statistical analysis to determine differentially expressed genes, collected known IBD genes from DisGeNet database, which were used to construct a IBD related PPI network with HIPPIE database...
July 13, 2018: BMC Bioinformatics
Chi Xiao, Weifu Li, Hao Deng, Xi Chen, Yang Yang, Qiwei Xie, Hua Han
BACKGROUND: The locations and shapes of synapses are important in reconstructing connectomes and analyzing synaptic plasticity. However, current synapse detection and segmentation methods are still not adequate for accurately acquiring the synaptic connectivity, and they cannot effectively alleviate the burden of synapse validation. RESULTS: We propose a fully automated method that relies on deep learning to realize the 3D reconstruction of synapses in electron microscopy (EM) images...
July 13, 2018: BMC Bioinformatics
Haojing Shao, Devika Ganesamoorthy, Tania Duarte, Minh Duc Cao, Clive J Hoggart, Lachlan J M Coin
BACKGROUND: Detection of genomic inversions remains challenging. Many existing methods primarily target inzversions with a non repetitive breakpoint, leaving inverted repeat (IR) mediated non-allelic homologous recombination (NAHR) inversions largely unexplored. RESULT: We present npInv, a novel tool specifically for detecting and genotyping NAHR inversion using long read sub-alignment of long read sequencing data. We benchmark npInv with other tools in both simulation and real data...
July 13, 2018: BMC Bioinformatics
Yuqing Zhang, David F Jenkins, Solaiappan Manimaran, W Evan Johnson
BACKGROUND: Combining genomic data sets from multiple studies is advantageous to increase statistical power in studies where logistical considerations restrict sample size or require the sequential generation of data. However, significant technical heterogeneity is commonly observed across multiple batches of data that are generated from different processing or reagent batches, experimenters, protocols, or profiling platforms. These so-called batch effects often confound true biological relationships in the data, reducing the power benefits of combining multiple batches, and may even lead to spurious results in some combined studies...
July 13, 2018: BMC Bioinformatics
Teng Zhang, Shao-Wu Zhang, Lin Zhang, Jia Meng
BACKGROUND: Methylated RNA immunoprecipitation sequencing (MeRIP-seq or m6 A-seq) has been extensively used for profiling transcriptome-wide distribution of RNA N6-Methyl-Adnosine methylation. However, due to the intrinsic properties of RNA molecules and the intricate procedures of this technique, m6 A-seq data often suffer from various flaws. A convenient and comprehensive tool is needed to assess the quality of m6 A-seq data to ensure that they are suitable for subsequent analysis. RESULTS: From a technical perspective, m6 A-seq can be considered as a combination of ChIP-seq and RNA-seq; hence, by effectively combing the data quality assessment metrics of the two techniques, we developed the trumpet R package for evaluation of m6 A-seq data quality...
July 13, 2018: BMC Bioinformatics
Geert Heyman, Ivan Vulić, Marie-Francine Moens
BACKGROUND: Bilingual lexicon induction (BLI) is an important task in the biomedical domain as translation resources are usually available for general language usage, but are often lacking in domain-specific settings. In this article we consider BLI as a classification problem and train a neural network composed of a combination of recurrent long short-term memory and deep feed-forward networks in order to obtain word-level and character-level representations. RESULTS: The results show that the word-level and character-level representations each improve state-of-the-art results for BLI and biomedical translation mining...
July 9, 2018: BMC Bioinformatics
Simone Dinarelli, Marco Girasole, Giovanni Longo
BACKGROUND: The collection and analysis of Atomic Force Microscopy force curves is a well-established procedure to obtain high-resolution information of non-topographic data from any kind of sample, including biological specimens. In particular, these analyses are commonly employed to study elasticity, stiffness or adhesion properties of the samples. Furthermore, the collection of several force curves over an extended area of the specimens allows reconstructing maps, called force volume maps, of the spatial distribution of the mechanical properties...
July 6, 2018: BMC Bioinformatics
Panu Somervuo, Patrik Koskinen, Peng Mei, Liisa Holm, Petri Auvinen, Lars Paulin
BACKGROUND: Current high-throughput sequencing platforms provide capacity to sequence multiple samples in parallel. Different samples are labeled by attaching a short sample specific nucleotide sequence, barcode, to each DNA molecule prior pooling them into a mix containing a number of libraries to be sequenced simultaneously. After sequencing, the samples are binned by identifying the barcode sequence within each sequence read. In order to tolerate sequencing errors, barcodes should be sufficiently apart from each other in sequence space...
July 5, 2018: BMC Bioinformatics
Maxwell Shapiro, Stephen Meier, Thomas MacCarthy
Following publication of the original article [1], the authors reported that Figs. 1 and 3 were interchanged. The original article has been corrected.
July 4, 2018: BMC Bioinformatics
Carl G de Boer, Aviv Regev
BACKGROUND: Variation in chromatin organization across single cells can help shed important light on the mechanisms controlling gene expression, but scale, noise, and sparsity pose significant challenges for interpretation of single cell chromatin data. Here, we develop BROCKMAN (Brockman Representation Of Chromatin by K-mers in Mark-Associated Nucleotides), an approach to infer variation in transcription factor (TF) activity across samples through unsupervised analysis of the variation in DNA sequences associated with an epigenomic mark...
July 3, 2018: BMC Bioinformatics
Stefano Beretta, Murray D Patterson, Simone Zaccaria, Gianluca Della Vedova, Paola Bonizzoni
BACKGROUND: Haplotype assembly is the process of assigning the different alleles of the variants covered by mapped sequencing reads to the two haplotypes of the genome of a human individual. Long reads, which are nowadays cheaper to produce and more widely available than ever before, have been used to reduce the fragmentation of the assembled haplotypes since their ability to span several variants along the genome. These long reads are also characterized by a high error rate, an issue which may be mitigated, however, with larger sets of reads, when this error rate is uniform across genome positions...
July 3, 2018: BMC Bioinformatics
Henning Lenz, Anke Hein, Volker Knoop
BACKGROUND: Gene expression in plant chloroplasts and mitochondria is affected by RNA editing. Numerous C-to-U conversions, accompanied by reverse U-to-C exchanges in some plant clades, alter the genetic information encoded in the organelle genomes. Predicting and analyzing RNA editing, which ranges from only few sites in some species to thousands in other taxa, is bioinformatically demanding. RESULTS: Here, we present major enhancements and extensions of PREPACT, a WWW-based service for analysing, predicting and cataloguing plant-type RNA editing...
July 3, 2018: BMC Bioinformatics
Tadi Venkata Sivakumar, Anirban Bhaduri, Rajasekhara Reddy Duvvuru Muni, Jin Hwan Park, Tae Yong Kim
BACKGROUND: Computation of reaction similarity is a pre-requisite for several bioinformatics applications including enzyme identification for specific biochemical reactions, enzyme classification and mining for specific inhibitors. Reaction similarity is often assessed at either two levels: (i) comparison across all the constituent substrates and products of a reaction, reaction level similarity, (ii) comparison at the transformation center with various degrees of neighborhood, transformation level similarity...
July 3, 2018: BMC Bioinformatics
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"