keyword
MENU ▼
Read by QxMD icon Read
search

Hadoop

keyword
https://www.readbyqxmd.com/read/28093410/fastdoop-a-versatile-and-efficient-library-for-the-input-of-fasta-and-fastq-files-for-mapreduce-hadoop-bioinformatics-applications
#1
Umberto Ferraro Petrillo, Gianluca Roscigno, Giuseppe Cattaneo, Raffaele Giancarlo
: MapReduce Hadoop bioinformatics applications require the availability of special-purpose routines to manage the input of sequence files. Unfortunately, the Hadoop framework does not provide any built-in support for the most popular sequence file formats like FASTA or BAM. Moreover, the development of these routines is not easy, both because of the diversity of these formats and the need for managing efficiently sequence datasets that may count up to billions of characters. We present FASTdoop, a generic Hadoop library for the management of FASTA and FASTQ files...
January 16, 2017: Bioinformatics
https://www.readbyqxmd.com/read/28075343/a-fast-synthetic-aperture-radar-raw-data-simulation-using-cloud-computing
#2
Zhixin Li, Dandan Su, Haijiang Zhu, Wei Li, Fan Zhang, Ruirui Li
Synthetic Aperture Radar (SAR) raw data simulation is a fundamental problem in radar system design and imaging algorithm research. The growth of surveying swath and resolution results in a significant increase in data volume and simulation period, which can be considered to be a comprehensive data intensive and computing intensive issue. Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased...
January 8, 2017: Sensors
https://www.readbyqxmd.com/read/28072850/chaos-based-simultaneous-compression-and-encryption-for-hadoop
#3
Muhammad Usama, Nordin Zakaria
Data compression and encryption are key components of commonly deployed platforms such as Hadoop. Numerous data compression and encryption tools are presently available on such platforms and the tools are characteristically applied in sequence, i.e., compression followed by encryption or encryption followed by compression. This paper focuses on the open-source Hadoop framework and proposes a data storage method that efficiently couples data compression with encryption. A simultaneous compression and encryption scheme is introduced that addresses an important implementation issue of source coding based on Tent Map and Piece-wise Linear Chaotic Map (PWLM), which is the infinite precision of real numbers that result from their long products...
2017: PloS One
https://www.readbyqxmd.com/read/28025200/falco-a-quick-and-flexible-single-cell-rna-seq-processing-framework-on-the-cloud
#4
Andrian Yang, Michael Troup, Peijie Lin, Joshua W K Ho
: : Single-cell RNA-seq (scRNA-seq) is increasingly used in a range of biomedical studies. Nonetheless, current RNA-seq analysis tools are not specifically designed to efficiently process scRNA-seq data due to their limited scalability. Here we introduce Falco, a cloud-based framework to enable paralellization of existing RNA-seq processing pipelines using big data technologies of Apache Hadoop and Apache Spark for performing massively parallel analysis of large scale transcriptomic data...
December 24, 2016: Bioinformatics
https://www.readbyqxmd.com/read/27993788/mruninovo-an-efficient-tool-for-de-novo-peptide-sequencing-utilizing-the-hadoop-distributed-computing-framework
#5
Chuang Li, Tao Chen, Qiang He, Yunping Zhu, Kenli Li
: Tandem mass spectrometry-based de novo peptide sequencing is a complex and time-consuming process. The current algorithms for de novo peptide sequencing cannot rapidly and thoroughly process large mass spectrometry datasets. In this paper, we propose MRUniNovo, a novel tool for parallel de novo peptide sequencing. MRUniNovo parallelizes UniNovo based on the Hadoop compute platform. Our experimental results demonstrate that MRUniNovo significantly reduces the computation time of de novo peptide sequencing without sacrificing the correctness and accuracy of the results, and thus can process very large datasets that UniNovo cannot...
December 19, 2016: Bioinformatics
https://www.readbyqxmd.com/read/27905520/a-parallel-adaboost-backpropagation-neural-network-for-massive-image-dataset-classification
#6
Jianfang Cao, Lichao Chen, Min Wang, Hao Shi, Yun Tian
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm...
December 1, 2016: Scientific Reports
https://www.readbyqxmd.com/read/27796839/real-time-medical-emergency-response-system-exploiting-iot-and-big-data-for-public-health
#7
M Mazhar Rathore, Awais Ahmad, Anand Paul, Jiafu Wan, Daqiang Zhang
Healthy people are important for any nation's development. Use of the Internet of Things (IoT)-based body area networks (BANs) is increasing for continuous monitoring and medical healthcare in order to perform real-time actions in case of emergencies. However, in the case of monitoring the health of all citizens or people in a country, the millions of sensors attached to human bodies generate massive volume of heterogeneous data, called "Big Data." Processing Big Data and performing real-time actions in critical situations is a challenging task...
December 2016: Journal of Medical Systems
https://www.readbyqxmd.com/read/27663493/biospark-scalable-analysis-of-large-numerical-datasets-from-biological-simulations-and-experiments-using-hadoop-and-spark
#8
Max Klein, Rati Sharma, Chris H Bohrer, Cameron M Avelis, Elijah Roberts
: Data-parallel programming techniques can dramatically decrease the time needed to analyze large datasets. While these methods have provided significant improvements for sequencing-based analyses, other areas of biological informatics have not yet adopted them. Here, we introduce Biospark, a new framework for performing data-parallel analysis on large numerical datasets. Biospark builds upon the open source Hadoop and Spark projects, bringing domain-specific features for biology. AVAILABILITY AND IMPLEMENTATION: Source code is licensed under the Apache 2...
September 22, 2016: Bioinformatics
https://www.readbyqxmd.com/read/27652177/towards-an-agent-based-traffic-regulation-and-recommendation-system-for-the-on-road-air-quality-control
#9
Abderrahmane Sadiq, Abdelaziz El Fazziki, Jamal Ouarzazi, Mohamed Sadgal
This paper presents an integrated and adaptive problem-solving approach to control the on-road air quality by modeling the road infrastructure, managing traffic based on pollution level and generating recommendations for road users. The aim is to reduce vehicle emissions in the most polluted road segments and optimizing the pollution levels. For this we propose the use of historical and real time pollution records and contextual data to calculate the air quality index on road networks and generate recommendations for reassigning traffic flow in order to improve the on-road air quality...
2016: SpringerPlus
https://www.readbyqxmd.com/read/27589753/estimation-accuracy-on-execution-time-of-run-time-tasks-in-a-heterogeneous-distributed-environment
#10
Qi Liu, Weidong Cai, Dandan Jin, Jian Shen, Zhangjie Fu, Xiaodong Liu, Nigel Linge
Distributed Computing has achieved tremendous development since cloud computing was proposed in 2006, and played a vital role promoting rapid growth of data collecting and analysis models, e.g., Internet of things, Cyber-Physical Systems, Big Data Analytics, etc. Hadoop has become a data convergence platform for sensor networks. As one of the core components, MapReduce facilitates allocating, processing and mining of collected large-scale data, where speculative execution strategies help solve straggler problems...
2016: Sensors
https://www.readbyqxmd.com/read/27429611/a-genetic-algorithm-based-job-scheduling-model-for-big-data-analytics
#11
Qinghua Lu, Shanshan Li, Weishan Zhang, Lei Zhang
Big data analytics (BDA) applications are a new category of software applications that process large amounts of data using scalable parallel processing infrastructure to obtain hidden value. Hadoop is the most mature open-source big data analytics framework, which implements the MapReduce programming model to process big data with MapReduce jobs. Big data analytics jobs are often continuous and not mutually separated. The existing work mainly focuses on executing jobs in sequence, which are often inefficient and consume high energy...
2016: EURASIP Journal on Wireless Communications and Networking
https://www.readbyqxmd.com/read/27375472/neuropigpen-a-scalable-toolkit-for-processing-electrophysiological-signal-data-in-neuroscience-applications-using-apache-pig
#12
Satya S Sahoo, Annan Wei, Joshua Valdez, Li Wang, Bilal Zonjy, Curtis Tatsuoka, Kenneth A Loparo, Samden D Lhatoo
The recent advances in neurological imaging and sensing technologies have led to rapid increase in the volume, rate of data generation, and variety of neuroscience data. This "neuroscience Big data" represents a significant opportunity for the biomedical research community to design experiments using data with greater timescale, large number of attributes, and statistically significant data size. The results from these new data-driven research techniques can advance our understanding of complex neurological disorders, help model long-term effects of brain injuries, and provide new insights into dynamics of brain networks...
2016: Frontiers in Neuroinformatics
https://www.readbyqxmd.com/read/27304987/big-data-a-parallel-particle-swarm-optimization-back-propagation-neural-network-algorithm-based-on-mapreduce
#13
Jianfang Cao, Hongyan Cui, Hao Shi, Lijuan Jiao
A back-propagation (BP) neural network can solve complicated random nonlinear mapping problems; therefore, it can be applied to a wide range of problems. However, as the sample size increases, the time required to train BP neural networks becomes lengthy. Moreover, the classification accuracy decreases as well. To improve the classification accuracy and runtime efficiency of the BP neural network algorithm, we proposed a parallel design and realization method for a particle swarm optimization (PSO)-optimized BP neural network based on MapReduce on the Hadoop platform using both the PSO algorithm and a parallel design...
2016: PloS One
https://www.readbyqxmd.com/read/27084948/mtdna-server-next-generation-sequencing-data-analysis-of-human-mitochondrial-dna-in-the-cloud
#14
Hansi Weissensteiner, Lukas Forer, Christian Fuchsberger, Bernd Schöpf, Anita Kloss-Brandstätter, Günther Specht, Florian Kronenberg, Sebastian Schönherr
Next generation sequencing (NGS) allows investigating mitochondrial DNA (mtDNA) characteristics such as heteroplasmy (i.e. intra-individual sequence variation) to a higher level of detail. While several pipelines for analyzing heteroplasmies exist, issues in usability, accuracy of results and interpreting final data limit their usage. Here we present mtDNA-Server, a scalable web server for the analysis of mtDNA studies of any size with a special focus on usability as well as reliable identification and quantification of heteroplasmic variants...
July 8, 2016: Nucleic Acids Research
https://www.readbyqxmd.com/read/26975600/analysis-of-microarray-leukemia-data-using-an-efficient-mapreduce-based-k-nearest-neighbor-classifier
#15
Mukesh Kumar, Nitish Kumar Rath, Santanu Kumar Rath
Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as it keeps changing with time. Therefore, the analysis of microarray datasets in a small amount of time is essential. They often contain a large amount of expression, but only a fraction of it comprises genes that are significantly expressed...
April 2016: Journal of Biomedical Informatics
https://www.readbyqxmd.com/read/26921234/hpg-pore-an-efficient-and-scalable-framework-for-nanopore-sequencing-data
#16
Joaquin Tarraga, Asunción Gallego, Vicente Arnau, Ignacio Medina, Joaquin Dopazo
BACKGROUND: The use of nanopore technologies is expected to spread in the future because they are portable and can sequence long fragments of DNA molecules without prior amplification. The first nanopore sequencer available, the MinION™ from Oxford Nanopore Technologies, is a USB-connected, portable device that allows real-time DNA analysis. In addition, other new instruments are expected to be released soon, which promise to outperform the current short-read technologies in terms of throughput...
February 27, 2016: BMC Bioinformatics
https://www.readbyqxmd.com/read/26897747/1001-ways-to-run-autodock-vina-for-virtual-screening
#17
Mohammad Mahdi Jaghoori, Boris Bleijlevens, Silvia D Olabarriaga
Large-scale computing technologies have enabled high-throughput virtual screening involving thousands to millions of drug candidates. It is not trivial, however, for biochemical scientists to evaluate the technical alternatives and their implications for running such large experiments. Besides experience with the molecular docking tool itself, the scientist needs to learn how to run it on high-performance computing (HPC) infrastructures, and understand the impact of the choices made. Here, we review such considerations for a specific tool, AutoDock Vina, and use experimental data to illustrate the following points: (1) an additional level of parallelization increases virtual screening throughput on a multi-core machine; (2) capturing of the random seed is not enough (though necessary) for reproducibility on heterogeneous distributed computing systems; (3) the overall time spent on the screening of a ligand library can be improved by analysis of factors affecting execution time per ligand, including number of active torsions, heavy atoms and exhaustiveness...
March 2016: Journal of Computer-aided Molecular Design
https://www.readbyqxmd.com/read/26887003/a-flexible-computational-framework-using-r-and-map-reduce-for-permutation-tests-of-massive-genetic-analysis-of-complex-traits
#18
Behrang Mahjani, Salman Toor, Carl Nettelblad, Sverker Holmgren
In quantitative trait locus (QTL) mapping significance of putative QTL is often determined using permutation testing. The computational needs to calculate the significance level are immense, 104 up to 108 or even more permutations can be needed. We have previously introduced the PruneDIRECT algorithm for multiple QTL scan with epistatic interactions. This algorithm has specific strengths for permutation testing. Here, we present a flexible, parallel computing framework for identifying multiple interacting QTL using the PruneDIRECT algorithm which uses the map-reduce model as implemented in Hadoop...
February 11, 2016: IEEE/ACM Transactions on Computational Biology and Bioinformatics
https://www.readbyqxmd.com/read/26884678/htsfinder-powerful-pipeline-of-dna-signature-discovery-by-parallel-and-distributed-computing
#19
Ramin Karimi, Andras Hajdu
Comprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species...
2016: Evolutionary Bioinformatics Online
https://www.readbyqxmd.com/read/26776220/workshop-on-topology-and-abstract-algebra-for-biomedicine
#20
Eric K Neumann, Svetlana Lockwood, Bala Krishnamoorthy, David Spivak
No abstract text is available yet for this article.
2016: Pacific Symposium on Biocomputing
keyword
keyword
4199
1
2
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read
×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"