keyword
MENU ▼
Read by QxMD icon Read
search

Hadoop

keyword
https://www.readbyqxmd.com/read/28423824/querying-archetype-based-electronic-health-records-using-hadoop-and-dewey-encoding-of-openehr-models
#1
Erik Sundvall, Fang Wei-Kleiner, Sergio M Freire, Patrick Lambrix
Archetype-based Electronic Health Record (EHR) systems using generic reference models from e.g. openEHR, ISO 13606 or CIMI should be easy to update and reconfigure with new types (or versions) of data models or entries, ideally with very limited programming or manual database tweaking. Exploratory research (e.g. epidemiology) leading to ad-hoc querying on a population-wide scale can be a challenge in such environments. This publication describes implementation and test of an archetype-aware Dewey encoding optimization that can be used to produce such systems in environments supporting relational operations, e...
2017: Studies in Health Technology and Informatics
https://www.readbyqxmd.com/read/28358893/halvade-rna-parallel-variant-calling-from-transcriptomic-data-using-mapreduce
#2
Dries Decap, Joke Reumers, Charlotte Herzeel, Pascal Costanza, Jan Fostier
Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows...
2017: PloS One
https://www.readbyqxmd.com/read/28317049/programming-and-runtime-support-to-blaze-fpga-accelerator-deployment-at-datacenter-scale
#3
Muhuan Huang, Di Wu, Cody Hao Yu, Zhenman Fang, Matteo Interlandi, Tyson Condie, Jason Cong
With the end of CPU core scaling due to dark silicon limitations, customized accelerators on FPGAs have gained increased attention in modern datacenters due to their lower power, high performance and energy efficiency. Evidenced by Microsoft's FPGA deployment in its Bing search engine and Intel's 16.7 billion acquisition of Altera, integrating FPGAs into datacenters is considered one of the most promising approaches to sustain future datacenter growth. However, it is quite challenging for existing big data computing systems-like Apache Spark and Hadoop-to access the performance and energy benefits of FPGA accelerators...
October 2016: Proceedings of the ... ACM Symposium on Cloud Computing [electronic Resource]: SOCC ... ... SoCC (Conference)
https://www.readbyqxmd.com/read/28316653/large-scale-virtual-screening-on-public-cloud-resources-with-apache-spark
#4
Marco Capuccini, Laeeq Ahmed, Wesley Schaal, Erwin Laure, Ola Spjuth
BACKGROUND: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level...
2017: Journal of Cheminformatics
https://www.readbyqxmd.com/read/28243601/mapreduce-algorithms-for-inferring-gene-regulatory-networks-from-time-series-microarray-data-using-an-information-theoretic-approach
#5
Yasser Abduallah, Turki Turki, Kevin Byron, Zongxuan Du, Miguel Cervantes-Cervantes, Jason T L Wang
Gene regulation is a series of processes that control gene expression and its extent. The connections among genes and their regulatory molecules, usually transcription factors, and a descriptive model of such connections are known as gene regulatory networks (GRNs). Elucidating GRNs is crucial to understand the inner workings of the cell and the complexity of gene interactions. To date, numerous algorithms have been developed to infer gene regulatory networks. However, as the number of identified genes increases and the complexity of their interactions is uncovered, networks and their regulatory mechanisms become cumbersome to test...
2017: BioMed Research International
https://www.readbyqxmd.com/read/28208684/a-real-time-high-performance-computation-architecture-for-multiple-moving-target-tracking-based-on-wide-area-motion-imagery-via-cloud-and-graphic-processing-units
#6
Kui Liu, Sixiao Wei, Zhijiang Chen, Bin Jia, Genshe Chen, Haibin Ling, Carolyn Sheaff, Erik Blasch
This paper presents the first attempt at combining Cloud with Graphic Processing Units (GPUs) in a complementary manner within the framework of a real-time high performance computation architecture for the application of detecting and tracking multiple moving targets based on Wide Area Motion Imagery (WAMI). More specifically, the GPU and Cloud Moving Target Tracking (GC-MTT) system applied a front-end web based server to perform the interaction with Hadoop and highly parallelized computation functions based on the Compute Unified Device Architecture (CUDA©)...
February 12, 2017: Sensors
https://www.readbyqxmd.com/read/28110735/optimizing-r-with-sparkr-on-a-commodity-cluster-for-biomedical-research
#7
Martin Sedlmayr, Tobias Würfl, Christian Maier, Lothar Häberle, Peter Fasching, Hans-Ulrich Prokosch, Jan Christoph
BACKGROUND AND OBJECTIVES: Medical researchers are challenged today by the enormous amount of data collected in healthcare. Analysis methods such as genome-wide association studies (GWAS) are often computationally intensive and thus require enormous resources to be performed in a reasonable amount of time. While dedicated clusters and public clouds may deliver the desired performance, their use requires upfront financial efforts or anonymous data, which is often not possible for preliminary or occasional tasks...
December 2016: Computer Methods and Programs in Biomedicine
https://www.readbyqxmd.com/read/28093410/fastdoop-a-versatile-and-efficient-library-for-the-input-of-fasta-and-fastq-files-for-mapreduce-hadoop-bioinformatics-applications
#8
Umberto Ferraro Petrillo, Gianluca Roscigno, Giuseppe Cattaneo, Raffaele Giancarlo
MapReduce Hadoop bioinformatics applications require the availability of special-purpose routines to manage the input of sequence files. Unfortunately, the Hadoop framework does not provide any built-in support for the most popular sequence file formats like FASTA or BAM. Moreover, the development of these routines is not easy, both because of the diversity of these formats and the need for managing efficiently sequence datasets that may count up to billions of characters. We present FASTdoop, a generic Hadoop library for the management of FASTA and FASTQ files...
January 16, 2017: Bioinformatics
https://www.readbyqxmd.com/read/28075343/a-fast-synthetic-aperture-radar-raw-data-simulation-using-cloud-computing
#9
Zhixin Li, Dandan Su, Haijiang Zhu, Wei Li, Fan Zhang, Ruirui Li
Synthetic Aperture Radar (SAR) raw data simulation is a fundamental problem in radar system design and imaging algorithm research. The growth of surveying swath and resolution results in a significant increase in data volume and simulation period, which can be considered to be a comprehensive data intensive and computing intensive issue. Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased...
January 8, 2017: Sensors
https://www.readbyqxmd.com/read/28072850/chaos-based-simultaneous-compression-and-encryption-for-hadoop
#10
Muhammad Usama, Nordin Zakaria
Data compression and encryption are key components of commonly deployed platforms such as Hadoop. Numerous data compression and encryption tools are presently available on such platforms and the tools are characteristically applied in sequence, i.e., compression followed by encryption or encryption followed by compression. This paper focuses on the open-source Hadoop framework and proposes a data storage method that efficiently couples data compression with encryption. A simultaneous compression and encryption scheme is introduced that addresses an important implementation issue of source coding based on Tent Map and Piece-wise Linear Chaotic Map (PWLM), which is the infinite precision of real numbers that result from their long products...
2017: PloS One
https://www.readbyqxmd.com/read/28025200/falco-a-quick-and-flexible-single-cell-rna-seq-processing-framework-on-the-cloud
#11
Andrian Yang, Michael Troup, Peijie Lin, Joshua W K Ho
Summary: Single-cell RNA-seq (scRNA-seq) is increasingly used in a range of biomedical studies. Nonetheless, current RNA-seq analysis tools are not specifically designed to efficiently process scRNA-seq data due to their limited scalability. Here we introduce Falco, a cloud-based framework to enable paralellization of existing RNA-seq processing pipelines using big data technologies of Apache Hadoop and Apache Spark for performing massively parallel analysis of large scale transcriptomic data...
March 1, 2017: Bioinformatics
https://www.readbyqxmd.com/read/27993788/mruninovo-an-efficient-tool-for-de-novo-peptide-sequencing-utilizing-the-hadoop-distributed-computing-framework
#12
Chuang Li, Tao Chen, Qiang He, Yunping Zhu, Kenli Li
Summary: Tandem mass spectrometry-based de novo peptide sequencing is a complex and time-consuming process. The current algorithms for de novo peptide sequencing cannot rapidly and thoroughly process large mass spectrometry datasets. In this paper, we propose MRUniNovo, a novel tool for parallel de novo peptide sequencing. MRUniNovo parallelizes UniNovo based on the Hadoop compute platform. Our experimental results demonstrate that MRUniNovo significantly reduces the computation time of de novo peptide sequencing without sacrificing the correctness and accuracy of the results, and thus can process very large datasets that UniNovo cannot...
March 15, 2017: Bioinformatics
https://www.readbyqxmd.com/read/27905520/a-parallel-adaboost-backpropagation-neural-network-for-massive-image-dataset-classification
#13
Jianfang Cao, Lichao Chen, Min Wang, Hao Shi, Yun Tian
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm...
December 1, 2016: Scientific Reports
https://www.readbyqxmd.com/read/27796839/real-time-medical-emergency-response-system-exploiting-iot-and-big-data-for-public-health
#14
M Mazhar Rathore, Awais Ahmad, Anand Paul, Jiafu Wan, Daqiang Zhang
Healthy people are important for any nation's development. Use of the Internet of Things (IoT)-based body area networks (BANs) is increasing for continuous monitoring and medical healthcare in order to perform real-time actions in case of emergencies. However, in the case of monitoring the health of all citizens or people in a country, the millions of sensors attached to human bodies generate massive volume of heterogeneous data, called "Big Data." Processing Big Data and performing real-time actions in critical situations is a challenging task...
December 2016: Journal of Medical Systems
https://www.readbyqxmd.com/read/27663493/biospark-scalable-analysis-of-large-numerical-datasets-from-biological-simulations-and-experiments-using-hadoop-and-spark
#15
Max Klein, Rati Sharma, Chris H Bohrer, Cameron M Avelis, Elijah Roberts
Data-parallel programming techniques can dramatically decrease the time needed to analyze large datasets. While these methods have provided significant improvements for sequencing-based analyses, other areas of biological informatics have not yet adopted them. Here, we introduce Biospark, a new framework for performing data-parallel analysis on large numerical datasets. Biospark builds upon the open source Hadoop and Spark projects, bringing domain-specific features for biology. AVAILABILITY AND IMPLEMENTATION: Source code is licensed under the Apache 2...
September 22, 2016: Bioinformatics
https://www.readbyqxmd.com/read/27652177/towards-an-agent-based-traffic-regulation-and-recommendation-system-for-the-on-road-air-quality-control
#16
Abderrahmane Sadiq, Abdelaziz El Fazziki, Jamal Ouarzazi, Mohamed Sadgal
This paper presents an integrated and adaptive problem-solving approach to control the on-road air quality by modeling the road infrastructure, managing traffic based on pollution level and generating recommendations for road users. The aim is to reduce vehicle emissions in the most polluted road segments and optimizing the pollution levels. For this we propose the use of historical and real time pollution records and contextual data to calculate the air quality index on road networks and generate recommendations for reassigning traffic flow in order to improve the on-road air quality...
2016: SpringerPlus
https://www.readbyqxmd.com/read/27589753/estimation-accuracy-on-execution-time-of-run-time-tasks-in-a-heterogeneous-distributed-environment
#17
Qi Liu, Weidong Cai, Dandan Jin, Jian Shen, Zhangjie Fu, Xiaodong Liu, Nigel Linge
Distributed Computing has achieved tremendous development since cloud computing was proposed in 2006, and played a vital role promoting rapid growth of data collecting and analysis models, e.g., Internet of things, Cyber-Physical Systems, Big Data Analytics, etc. Hadoop has become a data convergence platform for sensor networks. As one of the core components, MapReduce facilitates allocating, processing and mining of collected large-scale data, where speculative execution strategies help solve straggler problems...
August 30, 2016: Sensors
https://www.readbyqxmd.com/read/27429611/a-genetic-algorithm-based-job-scheduling-model-for-big-data-analytics
#18
Qinghua Lu, Shanshan Li, Weishan Zhang, Lei Zhang
Big data analytics (BDA) applications are a new category of software applications that process large amounts of data using scalable parallel processing infrastructure to obtain hidden value. Hadoop is the most mature open-source big data analytics framework, which implements the MapReduce programming model to process big data with MapReduce jobs. Big data analytics jobs are often continuous and not mutually separated. The existing work mainly focuses on executing jobs in sequence, which are often inefficient and consume high energy...
2016: EURASIP Journal on Wireless Communications and Networking
https://www.readbyqxmd.com/read/27375472/neuropigpen-a-scalable-toolkit-for-processing-electrophysiological-signal-data-in-neuroscience-applications-using-apache-pig
#19
Satya S Sahoo, Annan Wei, Joshua Valdez, Li Wang, Bilal Zonjy, Curtis Tatsuoka, Kenneth A Loparo, Samden D Lhatoo
The recent advances in neurological imaging and sensing technologies have led to rapid increase in the volume, rate of data generation, and variety of neuroscience data. This "neuroscience Big data" represents a significant opportunity for the biomedical research community to design experiments using data with greater timescale, large number of attributes, and statistically significant data size. The results from these new data-driven research techniques can advance our understanding of complex neurological disorders, help model long-term effects of brain injuries, and provide new insights into dynamics of brain networks...
2016: Frontiers in Neuroinformatics
https://www.readbyqxmd.com/read/27304987/big-data-a-parallel-particle-swarm-optimization-back-propagation-neural-network-algorithm-based-on-mapreduce
#20
Jianfang Cao, Hongyan Cui, Hao Shi, Lijuan Jiao
A back-propagation (BP) neural network can solve complicated random nonlinear mapping problems; therefore, it can be applied to a wide range of problems. However, as the sample size increases, the time required to train BP neural networks becomes lengthy. Moreover, the classification accuracy decreases as well. To improve the classification accuracy and runtime efficiency of the BP neural network algorithm, we proposed a parallel design and realization method for a particle swarm optimization (PSO)-optimized BP neural network based on MapReduce on the Hadoop platform using both the PSO algorithm and a parallel design...
2016: PloS One
keyword
keyword
4199
1
2
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read
×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"