keyword
MENU ▼
Read by QxMD icon Read
search

Hadoop

keyword
https://www.readbyqxmd.com/read/28661707/using-hadoop-mapreduce-for-parallel-genetic-algorithms-a-comparison-of-the-global-grid-and-island-models
#1
Filomena Ferrucci, Pasquale Salza, Federica Sarro
The need to improve the scalability of Genetic Algorithms (GAs) has motivated the research on Parallel Genetic Algorithms (PGAs), and different technologies and approaches have been used. Hadoop MapReduce represents one of the most mature technologies to develop parallel algorithms. Based on the fact that parallel algorithms introduce communication overhead, the aim of the present work is to understand if, and possibly when, the parallel GAs solutions using Hadoop MapReduce show better performance than sequential versions in terms of execution time...
June 29, 2017: Evolutionary Computation
https://www.readbyqxmd.com/read/28655296/sparkblast-scalable-blast-processing-using-in-memory-operations
#2
Marcelo Rodrigo de Castro, Catherine Dos Santos Tostes, Alberto M R Dávila, Hermes Senger, Fabricio A B da Silva
BACKGROUND: The demand for processing ever increasing amounts of genomic data has raised new challenges for the implementation of highly scalable and efficient computational systems. In this paper we propose SparkBLAST, a parallelization of a sequence alignment application (BLAST) that employs cloud computing for the provisioning of computational resources and Apache Spark as the coordination framework. As a proof of concept, some radionuclide-resistant bacterial genomes were selected for similarity analysis...
June 27, 2017: BMC Bioinformatics
https://www.readbyqxmd.com/read/28610458/large-scale-parallel-genome-assembler-over-cloud-computing-environment
#3
Arghya Kusum Das, Praveen Kumar Koppa, Sayan Goswami, Richard Platania, Seung-Jong Park
The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research...
June 2017: Journal of Bioinformatics and Computational Biology
https://www.readbyqxmd.com/read/28475668/mardre-efficient-mapreduce-based-removal-of-duplicate-dna-reads-in-the-cloud
#4
Roberto R Expósito, Jorge Veiga, Jorge González-Domínguez, Juan Touriño
Summary: This paper presents MarDRe, a de novo cloud-ready duplicate and near-duplicate removal tool that can process single-end and paired-end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted MapReduce programming model to fully exploit Big Data technologies on cloud-based infrastructures. Written in Java to maximize cross-platform compatibility, MarDRe is built upon the open-source Apache Hadoop project, the most popular distributed computing framework for scalable Big Data processing...
May 5, 2017: Bioinformatics
https://www.readbyqxmd.com/read/28423824/querying-archetype-based-electronic-health-records-using-hadoop-and-dewey-encoding-of-openehr-models
#5
Erik Sundvall, Fang Wei-Kleiner, Sergio M Freire, Patrick Lambrix
Archetype-based Electronic Health Record (EHR) systems using generic reference models from e.g. openEHR, ISO 13606 or CIMI should be easy to update and reconfigure with new types (or versions) of data models or entries, ideally with very limited programming or manual database tweaking. Exploratory research (e.g. epidemiology) leading to ad-hoc querying on a population-wide scale can be a challenge in such environments. This publication describes implementation and test of an archetype-aware Dewey encoding optimization that can be used to produce such systems in environments supporting relational operations, e...
2017: Studies in Health Technology and Informatics
https://www.readbyqxmd.com/read/28358893/halvade-rna-parallel-variant-calling-from-transcriptomic-data-using-mapreduce
#6
Dries Decap, Joke Reumers, Charlotte Herzeel, Pascal Costanza, Jan Fostier
Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows...
2017: PloS One
https://www.readbyqxmd.com/read/28317049/programming-and-runtime-support-to-blaze-fpga-accelerator-deployment-at-datacenter-scale
#7
Muhuan Huang, Di Wu, Cody Hao Yu, Zhenman Fang, Matteo Interlandi, Tyson Condie, Jason Cong
With the end of CPU core scaling due to dark silicon limitations, customized accelerators on FPGAs have gained increased attention in modern datacenters due to their lower power, high performance and energy efficiency. Evidenced by Microsoft's FPGA deployment in its Bing search engine and Intel's 16.7 billion acquisition of Altera, integrating FPGAs into datacenters is considered one of the most promising approaches to sustain future datacenter growth. However, it is quite challenging for existing big data computing systems-like Apache Spark and Hadoop-to access the performance and energy benefits of FPGA accelerators...
October 2016: Proceedings of the ... ACM Symposium on Cloud Computing [electronic Resource]: SOCC ... ... SoCC (Conference)
https://www.readbyqxmd.com/read/28316653/large-scale-virtual-screening-on-public-cloud-resources-with-apache-spark
#8
Marco Capuccini, Laeeq Ahmed, Wesley Schaal, Erwin Laure, Ola Spjuth
BACKGROUND: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level...
2017: Journal of Cheminformatics
https://www.readbyqxmd.com/read/28243601/mapreduce-algorithms-for-inferring-gene-regulatory-networks-from-time-series-microarray-data-using-an-information-theoretic-approach
#9
Yasser Abduallah, Turki Turki, Kevin Byron, Zongxuan Du, Miguel Cervantes-Cervantes, Jason T L Wang
Gene regulation is a series of processes that control gene expression and its extent. The connections among genes and their regulatory molecules, usually transcription factors, and a descriptive model of such connections are known as gene regulatory networks (GRNs). Elucidating GRNs is crucial to understand the inner workings of the cell and the complexity of gene interactions. To date, numerous algorithms have been developed to infer gene regulatory networks. However, as the number of identified genes increases and the complexity of their interactions is uncovered, networks and their regulatory mechanisms become cumbersome to test...
2017: BioMed Research International
https://www.readbyqxmd.com/read/28208684/a-real-time-high-performance-computation-architecture-for-multiple-moving-target-tracking-based-on-wide-area-motion-imagery-via-cloud-and-graphic-processing-units
#10
Kui Liu, Sixiao Wei, Zhijiang Chen, Bin Jia, Genshe Chen, Haibin Ling, Carolyn Sheaff, Erik Blasch
This paper presents the first attempt at combining Cloud with Graphic Processing Units (GPUs) in a complementary manner within the framework of a real-time high performance computation architecture for the application of detecting and tracking multiple moving targets based on Wide Area Motion Imagery (WAMI). More specifically, the GPU and Cloud Moving Target Tracking (GC-MTT) system applied a front-end web based server to perform the interaction with Hadoop and highly parallelized computation functions based on the Compute Unified Device Architecture (CUDA©)...
February 12, 2017: Sensors
https://www.readbyqxmd.com/read/28110735/optimizing-r-with-sparkr-on-a-commodity-cluster-for-biomedical-research
#11
Martin Sedlmayr, Tobias Würfl, Christian Maier, Lothar Häberle, Peter Fasching, Hans-Ulrich Prokosch, Jan Christoph
BACKGROUND AND OBJECTIVES: Medical researchers are challenged today by the enormous amount of data collected in healthcare. Analysis methods such as genome-wide association studies (GWAS) are often computationally intensive and thus require enormous resources to be performed in a reasonable amount of time. While dedicated clusters and public clouds may deliver the desired performance, their use requires upfront financial efforts or anonymous data, which is often not possible for preliminary or occasional tasks...
December 2016: Computer Methods and Programs in Biomedicine
https://www.readbyqxmd.com/read/28093410/fastdoop-a-versatile-and-efficient-library-for-the-input-of-fasta-and-fastq-files-for-mapreduce-hadoop-bioinformatics-applications
#12
Umberto Ferraro Petrillo, Gianluca Roscigno, Giuseppe Cattaneo, Raffaele Giancarlo
Summary: MapReduce Hadoop bioinformatics applications require the availability of special-purpose routines to manage the input of sequence files. Unfortunately, the Hadoop framework does not provide any built-in support for the most popular sequence file formats like FASTA or BAM. Moreover, the development of these routines is not easy, both because of the diversity of these formats and the need for managing efficiently sequence datasets that may count up to billions of characters. We present FASTdoop, a generic Hadoop library for the management of FASTA and FASTQ files...
May 15, 2017: Bioinformatics
https://www.readbyqxmd.com/read/28075343/a-fast-synthetic-aperture-radar-raw-data-simulation-using-cloud-computing
#13
Zhixin Li, Dandan Su, Haijiang Zhu, Wei Li, Fan Zhang, Ruirui Li
Synthetic Aperture Radar (SAR) raw data simulation is a fundamental problem in radar system design and imaging algorithm research. The growth of surveying swath and resolution results in a significant increase in data volume and simulation period, which can be considered to be a comprehensive data intensive and computing intensive issue. Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased...
January 8, 2017: Sensors
https://www.readbyqxmd.com/read/28072850/chaos-based-simultaneous-compression-and-encryption-for-hadoop
#14
Muhammad Usama, Nordin Zakaria
Data compression and encryption are key components of commonly deployed platforms such as Hadoop. Numerous data compression and encryption tools are presently available on such platforms and the tools are characteristically applied in sequence, i.e., compression followed by encryption or encryption followed by compression. This paper focuses on the open-source Hadoop framework and proposes a data storage method that efficiently couples data compression with encryption. A simultaneous compression and encryption scheme is introduced that addresses an important implementation issue of source coding based on Tent Map and Piece-wise Linear Chaotic Map (PWLM), which is the infinite precision of real numbers that result from their long products...
2017: PloS One
https://www.readbyqxmd.com/read/28025200/falco-a-quick-and-flexible-single-cell-rna-seq-processing-framework-on-the-cloud
#15
Andrian Yang, Michael Troup, Peijie Lin, Joshua W K Ho
Summary: Single-cell RNA-seq (scRNA-seq) is increasingly used in a range of biomedical studies. Nonetheless, current RNA-seq analysis tools are not specifically designed to efficiently process scRNA-seq data due to their limited scalability. Here we introduce Falco, a cloud-based framework to enable paralellization of existing RNA-seq processing pipelines using big data technologies of Apache Hadoop and Apache Spark for performing massively parallel analysis of large scale transcriptomic data...
March 1, 2017: Bioinformatics
https://www.readbyqxmd.com/read/27993788/mruninovo-an-efficient-tool-for-de-novo-peptide-sequencing-utilizing-the-hadoop-distributed-computing-framework
#16
Chuang Li, Tao Chen, Qiang He, Yunping Zhu, Kenli Li
Summary: Tandem mass spectrometry-based de novo peptide sequencing is a complex and time-consuming process. The current algorithms for de novo peptide sequencing cannot rapidly and thoroughly process large mass spectrometry datasets. In this paper, we propose MRUniNovo, a novel tool for parallel de novo peptide sequencing. MRUniNovo parallelizes UniNovo based on the Hadoop compute platform. Our experimental results demonstrate that MRUniNovo significantly reduces the computation time of de novo peptide sequencing without sacrificing the correctness and accuracy of the results, and thus can process very large datasets that UniNovo cannot...
March 15, 2017: Bioinformatics
https://www.readbyqxmd.com/read/27905520/a-parallel-adaboost-backpropagation-neural-network-for-massive-image-dataset-classification
#17
Jianfang Cao, Lichao Chen, Min Wang, Hao Shi, Yun Tian
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm...
December 1, 2016: Scientific Reports
https://www.readbyqxmd.com/read/27796839/real-time-medical-emergency-response-system-exploiting-iot-and-big-data-for-public-health
#18
M Mazhar Rathore, Awais Ahmad, Anand Paul, Jiafu Wan, Daqiang Zhang
Healthy people are important for any nation's development. Use of the Internet of Things (IoT)-based body area networks (BANs) is increasing for continuous monitoring and medical healthcare in order to perform real-time actions in case of emergencies. However, in the case of monitoring the health of all citizens or people in a country, the millions of sensors attached to human bodies generate massive volume of heterogeneous data, called "Big Data." Processing Big Data and performing real-time actions in critical situations is a challenging task...
December 2016: Journal of Medical Systems
https://www.readbyqxmd.com/read/27663493/biospark-scalable-analysis-of-large-numerical-datasets-from-biological-simulations-and-experiments-using-hadoop-and-spark
#19
Max Klein, Rati Sharma, Chris H Bohrer, Cameron M Avelis, Elijah Roberts
Data-parallel programming techniques can dramatically decrease the time needed to analyze large datasets. While these methods have provided significant improvements for sequencing-based analyses, other areas of biological informatics have not yet adopted them. Here, we introduce Biospark, a new framework for performing data-parallel analysis on large numerical datasets. Biospark builds upon the open source Hadoop and Spark projects, bringing domain-specific features for biology. AVAILABILITY AND IMPLEMENTATION: Source code is licensed under the Apache 2...
September 22, 2016: Bioinformatics
https://www.readbyqxmd.com/read/27652177/towards-an-agent-based-traffic-regulation-and-recommendation-system-for-the-on-road-air-quality-control
#20
Abderrahmane Sadiq, Abdelaziz El Fazziki, Jamal Ouarzazi, Mohamed Sadgal
This paper presents an integrated and adaptive problem-solving approach to control the on-road air quality by modeling the road infrastructure, managing traffic based on pollution level and generating recommendations for road users. The aim is to reduce vehicle emissions in the most polluted road segments and optimizing the pollution levels. For this we propose the use of historical and real time pollution records and contextual data to calculate the air quality index on road networks and generate recommendations for reassigning traffic flow in order to improve the on-road air quality...
2016: SpringerPlus
keyword
keyword
4199
1
2
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read
×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"