keyword
MENU ▼
Read by QxMD icon Read
search

Hadoop

keyword
https://www.readbyqxmd.com/read/27905520/a-parallel-adaboost-backpropagation-neural-network-for-massive-image-dataset-classification
#1
Jianfang Cao, Lichao Chen, Min Wang, Hao Shi, Yun Tian
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm...
December 1, 2016: Scientific Reports
https://www.readbyqxmd.com/read/27796839/real-time-medical-emergency-response-system-exploiting-iot-and-big-data-for-public-health
#2
M Mazhar Rathore, Awais Ahmad, Anand Paul, Jiafu Wan, Daqiang Zhang
Healthy people are important for any nation's development. Use of the Internet of Things (IoT)-based body area networks (BANs) is increasing for continuous monitoring and medical healthcare in order to perform real-time actions in case of emergencies. However, in the case of monitoring the health of all citizens or people in a country, the millions of sensors attached to human bodies generate massive volume of heterogeneous data, called "Big Data." Processing Big Data and performing real-time actions in critical situations is a challenging task...
December 2016: Journal of Medical Systems
https://www.readbyqxmd.com/read/27663493/biospark-scalable-analysis-of-large-numerical-datasets-from-biological-simulations-and-experiments-using-hadoop-and-spark
#3
Max Klein, Rati Sharma, Chris H Bohrer, Cameron M Avelis, Elijah Roberts
: Data-parallel programming techniques can dramatically decrease the time needed to analyze large datasets. While these methods have provided significant improvements for sequencing-based analyses, other areas of biological informatics have not yet adopted them. Here, we introduce Biospark, a new framework for performing data-parallel analysis on large numerical datasets. Biospark builds upon the open source Hadoop and Spark projects, bringing domain-specific features for biology. AVAILABILITY AND IMPLEMENTATION: Source code is licensed under the Apache 2...
September 22, 2016: Bioinformatics
https://www.readbyqxmd.com/read/27652177/towards-an-agent-based-traffic-regulation-and-recommendation-system-for-the-on-road-air-quality-control
#4
Abderrahmane Sadiq, Abdelaziz El Fazziki, Jamal Ouarzazi, Mohamed Sadgal
This paper presents an integrated and adaptive problem-solving approach to control the on-road air quality by modeling the road infrastructure, managing traffic based on pollution level and generating recommendations for road users. The aim is to reduce vehicle emissions in the most polluted road segments and optimizing the pollution levels. For this we propose the use of historical and real time pollution records and contextual data to calculate the air quality index on road networks and generate recommendations for reassigning traffic flow in order to improve the on-road air quality...
2016: SpringerPlus
https://www.readbyqxmd.com/read/27589753/estimation-accuracy-on-execution-time-of-run-time-tasks-in-a-heterogeneous-distributed-environment
#5
Qi Liu, Weidong Cai, Dandan Jin, Jian Shen, Zhangjie Fu, Xiaodong Liu, Nigel Linge
Distributed Computing has achieved tremendous development since cloud computing was proposed in 2006, and played a vital role promoting rapid growth of data collecting and analysis models, e.g., Internet of things, Cyber-Physical Systems, Big Data Analytics, etc. Hadoop has become a data convergence platform for sensor networks. As one of the core components, MapReduce facilitates allocating, processing and mining of collected large-scale data, where speculative execution strategies help solve straggler problems...
2016: Sensors
https://www.readbyqxmd.com/read/27429611/a-genetic-algorithm-based-job-scheduling-model-for-big-data-analytics
#6
Qinghua Lu, Shanshan Li, Weishan Zhang, Lei Zhang
Big data analytics (BDA) applications are a new category of software applications that process large amounts of data using scalable parallel processing infrastructure to obtain hidden value. Hadoop is the most mature open-source big data analytics framework, which implements the MapReduce programming model to process big data with MapReduce jobs. Big data analytics jobs are often continuous and not mutually separated. The existing work mainly focuses on executing jobs in sequence, which are often inefficient and consume high energy...
2016: EURASIP Journal on Wireless Communications and Networking
https://www.readbyqxmd.com/read/27375472/neuropigpen-a-scalable-toolkit-for-processing-electrophysiological-signal-data-in-neuroscience-applications-using-apache-pig
#7
Satya S Sahoo, Annan Wei, Joshua Valdez, Li Wang, Bilal Zonjy, Curtis Tatsuoka, Kenneth A Loparo, Samden D Lhatoo
The recent advances in neurological imaging and sensing technologies have led to rapid increase in the volume, rate of data generation, and variety of neuroscience data. This "neuroscience Big data" represents a significant opportunity for the biomedical research community to design experiments using data with greater timescale, large number of attributes, and statistically significant data size. The results from these new data-driven research techniques can advance our understanding of complex neurological disorders, help model long-term effects of brain injuries, and provide new insights into dynamics of brain networks...
2016: Frontiers in Neuroinformatics
https://www.readbyqxmd.com/read/27304987/big-data-a-parallel-particle-swarm-optimization-back-propagation-neural-network-algorithm-based-on-mapreduce
#8
Jianfang Cao, Hongyan Cui, Hao Shi, Lijuan Jiao
A back-propagation (BP) neural network can solve complicated random nonlinear mapping problems; therefore, it can be applied to a wide range of problems. However, as the sample size increases, the time required to train BP neural networks becomes lengthy. Moreover, the classification accuracy decreases as well. To improve the classification accuracy and runtime efficiency of the BP neural network algorithm, we proposed a parallel design and realization method for a particle swarm optimization (PSO)-optimized BP neural network based on MapReduce on the Hadoop platform using both the PSO algorithm and a parallel design...
2016: PloS One
https://www.readbyqxmd.com/read/27084948/mtdna-server-next-generation-sequencing-data-analysis-of-human-mitochondrial-dna-in-the-cloud
#9
Hansi Weissensteiner, Lukas Forer, Christian Fuchsberger, Bernd Schöpf, Anita Kloss-Brandstätter, Günther Specht, Florian Kronenberg, Sebastian Schönherr
Next generation sequencing (NGS) allows investigating mitochondrial DNA (mtDNA) characteristics such as heteroplasmy (i.e. intra-individual sequence variation) to a higher level of detail. While several pipelines for analyzing heteroplasmies exist, issues in usability, accuracy of results and interpreting final data limit their usage. Here we present mtDNA-Server, a scalable web server for the analysis of mtDNA studies of any size with a special focus on usability as well as reliable identification and quantification of heteroplasmic variants...
July 8, 2016: Nucleic Acids Research
https://www.readbyqxmd.com/read/26975600/analysis-of-microarray-leukemia-data-using-an-efficient-mapreduce-based-k-nearest-neighbor-classifier
#10
Mukesh Kumar, Nitish Kumar Rath, Santanu Kumar Rath
Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as it keeps changing with time. Therefore, the analysis of microarray datasets in a small amount of time is essential. They often contain a large amount of expression, but only a fraction of it comprises genes that are significantly expressed...
April 2016: Journal of Biomedical Informatics
https://www.readbyqxmd.com/read/26921234/hpg-pore-an-efficient-and-scalable-framework-for-nanopore-sequencing-data
#11
Joaquin Tarraga, Asunción Gallego, Vicente Arnau, Ignacio Medina, Joaquin Dopazo
BACKGROUND: The use of nanopore technologies is expected to spread in the future because they are portable and can sequence long fragments of DNA molecules without prior amplification. The first nanopore sequencer available, the MinION™ from Oxford Nanopore Technologies, is a USB-connected, portable device that allows real-time DNA analysis. In addition, other new instruments are expected to be released soon, which promise to outperform the current short-read technologies in terms of throughput...
2016: BMC Bioinformatics
https://www.readbyqxmd.com/read/26897747/1001-ways-to-run-autodock-vina-for-virtual-screening
#12
Mohammad Mahdi Jaghoori, Boris Bleijlevens, Silvia D Olabarriaga
Large-scale computing technologies have enabled high-throughput virtual screening involving thousands to millions of drug candidates. It is not trivial, however, for biochemical scientists to evaluate the technical alternatives and their implications for running such large experiments. Besides experience with the molecular docking tool itself, the scientist needs to learn how to run it on high-performance computing (HPC) infrastructures, and understand the impact of the choices made. Here, we review such considerations for a specific tool, AutoDock Vina, and use experimental data to illustrate the following points: (1) an additional level of parallelization increases virtual screening throughput on a multi-core machine; (2) capturing of the random seed is not enough (though necessary) for reproducibility on heterogeneous distributed computing systems; (3) the overall time spent on the screening of a ligand library can be improved by analysis of factors affecting execution time per ligand, including number of active torsions, heavy atoms and exhaustiveness...
March 2016: Journal of Computer-aided Molecular Design
https://www.readbyqxmd.com/read/26887003/a-flexible-computational-framework-using-r-and-map-reduce-for-permutation-tests-of-massive-genetic-analysis-of-complex-traits
#13
Behrang Mahjani, Salman Toor, Carl Nettelblad, Sverker Holmgren
In quantitative trait locus (QTL) mapping significance of putative QTL is often determined using permutation testing. The computational needs to calculate the significance level are immense, 104 up to 108 or even more permutations can be needed. We have previously introduced the PruneDIRECT algorithm for multiple QTL scan with epistatic interactions. This algorithm has specific strengths for permutation testing. Here, we present a flexible, parallel computing framework for identifying multiple interacting QTL using the PruneDIRECT algorithm which uses the map-reduce model as implemented in Hadoop...
February 11, 2016: IEEE/ACM Transactions on Computational Biology and Bioinformatics
https://www.readbyqxmd.com/read/26884678/htsfinder-powerful-pipeline-of-dna-signature-discovery-by-parallel-and-distributed-computing
#14
Ramin Karimi, Andras Hajdu
Comprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species...
2016: Evolutionary Bioinformatics Online
https://www.readbyqxmd.com/read/26776220/workshop-on-topology-and-abstract-algebra-for-biomedicine
#15
Eric K Neumann, Svetlana Lockwood, Bala Krishnamoorthy, David Spivak
No abstract text is available yet for this article.
2016: Pacific Symposium on Biocomputing
https://www.readbyqxmd.com/read/26731286/scalable-predictive-analysis-in-critically-ill-patients-using-a-visual-open-data-analysis-platform
#16
Sven Van Poucke, Zhongheng Zhang, Martin Schmitz, Milan Vukicevic, Margot Vander Laenen, Leo Anthony Celi, Cathy De Deyne
With the accumulation of large amounts of health related data, predictive analytics could stimulate the transformation of reactive medicine towards Predictive, Preventive and Personalized (PPPM) Medicine, ultimately affecting both cost and quality of care. However, high-dimensionality and high-complexity of the data involved, prevents data-driven methods from easy translation into clinically relevant models. Additionally, the application of cutting edge predictive methods and data manipulation require substantial programming skills, limiting its direct exploitation by medical domain experts...
2016: PloS One
https://www.readbyqxmd.com/read/26707450/unstructured-medical-image-query-using-big-data-an-epilepsy-case-study
#17
Sarmad Istephan, Mohammad-Reza Siadat
Big data technologies are critical to the medical field which requires new frameworks to leverage them. Such frameworks would benefit medical experts to test hypotheses by querying huge volumes of unstructured medical data to provide better patient care. The objective of this work is to implement and examine the feasibility of having such a framework to provide efficient querying of unstructured data in unlimited ways. The feasibility study was conducted specifically in the epilepsy field. The proposed framework evaluates a query in two phases...
February 2016: Journal of Biomedical Informatics
https://www.readbyqxmd.com/read/26664721/erratum-to-a-quantitative-assessment-of-the-hadoop-framework-for-analyzing-massively-parallel-dna-sequencing-data
#18
Alexey Siretskiy, Tore Sundqvist, Mikhail Voznesenskiy, Ola Spjuth
[This corrects the article DOI: 10.1186/s13742-015-0058-5.].
2015: GigaScience
https://www.readbyqxmd.com/read/26651996/variantspark-population-scale-clustering-of-genotype-information
#19
COMPARATIVE STUDY
Aidan R O'Brien, Neil F W Saunders, Yi Guo, Fabian A Buske, Rodney J Scott, Denis C Bauer
BACKGROUND: Genomic information is increasingly used in medical practice giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. The widely used Hadoop MapReduce architecture and associated machine learning library, Mahout, provide the means for tackling computationally challenging tasks. However, many genomic analyses do not fit the Map-Reduce paradigm. We therefore utilise the recently developed SPARK engine, along with its associated machine learning library, MLlib, which offers more flexibility in the parallelisation of population-scale bioinformatics tasks...
December 10, 2015: BMC Genomics
https://www.readbyqxmd.com/read/26625429/scalable-linear-visual-feature-learning-via-online-parallel-nonnegative-matrix-factorization
#20
Xueyi Zhao, Xi Li, Zhongfei Zhang, Chunhua Shen, Yueting Zhuang, Lixin Gao, Xuelong Li
Visual feature learning, which aims to construct an effective feature representation for visual data, has a wide range of applications in computer vision. It is often posed as a problem of nonnegative matrix factorization (NMF), which constructs a linear representation for the data. Although NMF is typically parallelized for efficiency, traditional parallelization methods suffer from either an expensive computation or a high runtime memory usage. To alleviate this problem, we propose a parallel NMF method called alternating least square block decomposition (ALSD), which efficiently solves a set of conditionally independent optimization subproblems based on a highly parallelized fine-grained grid-based blockwise matrix decomposition...
November 26, 2015: IEEE Transactions on Neural Networks and Learning Systems
keyword
keyword
4199
1
2
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read
×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"