keyword
MENU ▼
Read by QxMD icon Read
search

Hadoop

keyword
https://www.readbyqxmd.com/read/29649172/hadoop-oriented-smart-cities-architecture
#1
Vlad Diaconita, Ana-Ramona Bologa, Razvan Bologa
A smart city implies a consistent use of technology for the benefit of the community. As the city develops over time, components and subsystems such as smart grids, smart water management, smart traffic and transportation systems, smart waste management systems, smart security systems, or e-governance are added. These components ingest and generate a multitude of structured, semi-structured or unstructured data that may be processed using a variety of algorithms in batches, micro batches or in real-time. The ICT architecture must be able to handle the increased storage and processing needs...
April 12, 2018: Sensors
https://www.readbyqxmd.com/read/29607412/experiences-with-the-twitter-health-surveillance-ths-system
#2
Manuel Rodríguez-Martínez
Social media has become an important platform to gauge public opinion on topics related to our daily lives. In practice, processing these posts requires big data analytics tools since the volume of data and the speed of production overwhelm single-server solutions. Building an application to capture and analyze posts from social media can be a challenge simply because it requires combining a set of complex software tools that often times are tricky to configure, tune, and maintain. In many instances, the application ends up being an assorted collection of Java/Scala programs or Python scripts that developers cobble together to generate the data products they need...
June 2017: Proceedings. IEEE International Congress on Big Data
https://www.readbyqxmd.com/read/29600663/-traditional-chinese-medicine-data-management-policy-in-big-data-environment
#3
Yang Liang, Chang-Song Ding, Xin-di Huang, Le Deng
As traditional data management model cannot effectively manage the massive data in traditional Chinese medicine(TCM) due to the uncertainty of data object attributes as well as the diversity and abstraction of data representation, a management strategy for TCM data based on big data technology is proposed. Based on true characteristics of TCM data, this strategy could solve the problems of the uncertainty of data object attributes in TCM information and the non-uniformity of the data representation by using modeless properties of stored objects in big data technology...
February 2018: Zhongguo Zhong Yao za Zhi, Zhongguo Zhongyao Zazhi, China Journal of Chinese Materia Medica
https://www.readbyqxmd.com/read/29596506/correction-chaos-based-simultaneous-compression-and-encryption-for-hadoop
#4
Muhammad Usama, Nordin Zakaria
[This corrects the article DOI: 10.1371/journal.pone.0168207.].
2018: PloS One
https://www.readbyqxmd.com/read/29460090/medical-big-data-warehouse-architecture-and-system-design-a-case-study-improving-healthcare-resources-distribution
#5
Abderrazak Sebaa, Fatima Chikh, Amina Nouicer, AbdelKamel Tari
The huge increases in medical devices and clinical applications which generate enormous data have raised a big issue in managing, processing, and mining this massive amount of data. Indeed, traditional data warehousing frameworks can not be effective when managing the volume, variety, and velocity of current medical applications. As a result, several data warehouses face many issues over medical data and many challenges need to be addressed. New solutions have emerged and Hadoop is one of the best examples, it can be used to process these streams of medical data...
February 19, 2018: Journal of Medical Systems
https://www.readbyqxmd.com/read/29346410/an-evaluation-of-multi-probe-locality-sensitive-hashing-for-computing-similarities-over-web-scale-query-logs
#6
Graham Cormode, Anirban Dasgupta, Amit Goyal, Chi Hoon Lee
Many modern applications of AI such as web search, mobile browsing, image processing, and natural language processing rely on finding similar items from a large database of complex objects. Due to the very large scale of data involved (e.g., users' queries from commercial search engines), computing such near or nearest neighbors is a non-trivial task, as the computational cost grows significantly with the number of items. To address this challenge, we adopt Locality Sensitive Hashing (a.k.a, LSH) methods and evaluate four variants in a distributed computing environment (specifically, Hadoop)...
2018: PloS One
https://www.readbyqxmd.com/read/29342232/informational-and-linguistic-analysis-of-large-genomic-sequence-collections-via-efficient-hadoop-cluster-algorithms
#7
Umberto Ferraro Petrillo, Gianluca Roscigno, Giuseppe Cattaneo, Raffaele Giancarlo
Motivation: Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e., how many times each k-mer in {A;C; G; T}k occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing...
January 12, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29320579/emotional-modelling-and-classification-of-a-large-scale-collection-of-scene-images-in-a-cluster-environment
#8
Jianfang Cao, Yanfei Li, Yun Tian
The development of network technology and the popularization of image capturing devices have led to a rapid increase in the number of digital images available, and it is becoming increasingly difficult to identify a desired image from among the massive number of possible images. Images usually contain rich semantic information, and people usually understand images at a high semantic level. Therefore, achieving the ability to use advanced technology to identify the emotional semantics contained in images to enable emotional semantic image classification remains an urgent issue in various industries...
2018: PloS One
https://www.readbyqxmd.com/read/29297337/reconstructing-evolutionary-trees-in-parallel-for-massive-sequences
#9
Quan Zou, Shixiang Wan, Xiangxiang Zeng, Zhanshan Sam Ma
BACKGROUND: Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel...
December 14, 2017: BMC Systems Biology
https://www.readbyqxmd.com/read/29295690/hadoop-mcc-efficient-multiple-compound-comparison-algorithm-using-hadoop
#10
Guan-Jie Hua, Che-Lun Hung, Chuan Yi Tang
In this paper, we propose a novel heterogeneous high performance computing method, named as Hadoop-MCC, integrating Hadoop and GPU, to compare huge amount of chemical structures efficiently. The proposed method gains the high availability and fault tolerance from Hadoop, as Hadoop is used to scatter input data to GPU devices and gather the results from GPU devices. A comparison of LINGO is performed on each GPU device in parallel. According to the experimental results, the proposed method on multiple GPU devices can achieve better computational performance than the CUDA-MCC on a single GPU device...
January 2, 2018: Combinatorial Chemistry & High Throughput Screening
https://www.readbyqxmd.com/read/29194413/an-interface-for-biomedical-big-data-processing-on-the-tianhe-2-supercomputer
#11
Xi Yang, Chengkun Wu, Kai Lu, Lin Fang, Yong Zhang, Shengkang Li, Guixin Guo, YunFei Du
Big data, cloud computing, and high-performance computing (HPC) are at the verge of convergence. Cloud computing is already playing an active part in big data processing with the help of big data frameworks like Hadoop and Spark. The recent upsurge of high-performance computing in China provides extra possibilities and capacity to address the challenges associated with big data. In this paper, we propose Orion-a big data interface on the Tianhe-2 supercomputer-to enable big data applications to run on Tianhe-2 via a single command or a shell script...
December 1, 2017: Molecules: a Journal of Synthetic Chemistry and Natural Product Chemistry
https://www.readbyqxmd.com/read/29185792/metres-an-efficient-database-for-genomic-applications
#12
Jordi Vilaplana, Rui Alves, Francesc Solsona, Jordi Mateo, Ivan Teixidó, Marc Pifarré
MetReS (Metabolic Reconstruction Server) is a genomic database that is shared between two software applications that address important biological problems. Biblio-MetReS is a data-mining tool that enables the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the processes of interest and their function. The main goal of this work was to identify the areas where the performance of the MetReS database performance could be improved and to test whether this improvement would scale to larger datasets and more complex types of analysis...
February 2018: Journal of Computational Biology: a Journal of Computational Molecular Cell Biology
https://www.readbyqxmd.com/read/29178837/vispa2-a-scalable-pipeline-for-high-throughput-identification-and-annotation-of-vector-integration-sites
#13
Giulio Spinozzi, Andrea Calabria, Stefano Brasca, Stefano Beretta, Ivan Merelli, Luciano Milanesi, Eugenio Montini
BACKGROUND: Bioinformatics tools designed to identify lentiviral or retroviral vector insertion sites in the genome of host cells are used to address the safety and long-term efficacy of hematopoietic stem cell gene therapy applications and to study the clonal dynamics of hematopoietic reconstitution. The increasing number of gene therapy clinical trials combined with the increasing amount of Next Generation Sequencing data, aimed at identifying integration sites, require both highly accurate and efficient computational software able to correctly process "big data" in a reasonable computational time...
November 25, 2017: BMC Bioinformatics
https://www.readbyqxmd.com/read/29068640/handling-data-skew-in-mapreduce-cluster-by-using-partition-tuning
#14
Yufei Gao, Yanjie Zhou, Bing Zhou, Lei Shi, Jiacai Zhang
The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years. The MapReduce programming model has been successfully used for big data analytics. However, data skew invariably occurs in big data analytics and seriously affects efficiency. To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH). In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew...
2017: Journal of Healthcare Engineering
https://www.readbyqxmd.com/read/29065568/handling-data-skew-in-mapreduce-cluster-by-using-partition-tuning
#15
Yufei Gao, Yanjie Zhou, Bing Zhou, Lei Shi, Jiacai Zhang
The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years. The MapReduce programming model has been successfully used for big data analytics. However, data skew invariably occurs in big data analytics and seriously affects efficiency. To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH). In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew...
2017: Journal of Healthcare Engineering
https://www.readbyqxmd.com/read/29060620/deepdeath-learning-to-predict-the-underlying-cause-of-death-with-big-data
#16
Hamid Reza Hassanzadeh, Ying Sha, May D Wang
Multiple cause-of-death data provides a valuable source of information that can be used to enhance health standards by predicting health related trajectories in societies with large populations. These data are often available in large quantities across U.S. states and require Big Data techniques to uncover complex hidden patterns. We design two different classes of models suitable for large-scale analysis of mortality data, a Hadoop-based ensemble of random forests trained over N-grams, and the DeepDeath, a deep classifier based on the recurrent neural network (RNN)...
July 2017: Conference Proceedings: Annual International Conference of the IEEE Engineering in Medicine and Biology Society
https://www.readbyqxmd.com/read/28961134/apriori-versions-based-on-mapreduce-for-mining-frequent-patterns-on-big-data
#17
Jose Maria Luna, Francisco Padillo, Mykola Pechenizkiy, Sebastian Ventura
Pattern mining is one of the most important tasks to extract meaningful and useful information from raw data. This task aims to extract item-sets that represent any type of homogeneity and regularity in data. Although many efficient algorithms have been developed in this regard, the growing interest in data has caused the performance of existing pattern mining techniques to be dropped. The goal of this paper is to propose new efficient pattern mining algorithms to work in big data. To this aim, a series of algorithms based on the MapReduce framework and the Hadoop open-source implementation have been proposed...
September 27, 2017: IEEE Transactions on Cybernetics
https://www.readbyqxmd.com/read/28945604/a-distributed-fuzzy-associative-classifier-for-big-data
#18
Armando Segatori, Alessio Bechini, Pietro Ducange, Francesco Marcelloni
Fuzzy associative classification has not been widely analyzed in the literature, although associative classifiers (ACs) have proved to be very effective in different real domain applications. The main reason is that learning fuzzy ACs is a very heavy task, especially when dealing with large datasets. To overcome this drawback, in this paper, we propose an efficient distributed fuzzy associative classification approach based on the MapReduce paradigm. The approach exploits a novel distributed discretizer based on fuzzy entropy for efficiently generating fuzzy partitions of the attributes...
September 19, 2017: IEEE Transactions on Cybernetics
https://www.readbyqxmd.com/read/28884169/cloud-engineering-principles-and-technology-enablers-for-medical-image-processing-as-a-service
#19
Shunxing Bao, Andrew J Plassard, Bennett A Landman, Aniruddha Gokhale
Traditional in-house, laboratory-based medical imaging studies use hierarchical data structures (e.g., NFS file stores) or databases (e.g., COINS, XNAT) for storage and retrieval. The resulting performance from these approaches is, however, impeded by standard network switches since they can saturate network bandwidth during transfer from storage to processing nodes for even moderate-sized studies. To that end, a cloud-based "medical image processing-as-a-service" offers promise in utilizing the ecosystem of Apache Hadoop, which is a flexible framework providing distributed, scalable, fault tolerant storage and parallel computational modules, and HBase, which is a NoSQL database built atop Hadoop's distributed file system...
April 2017: Proceedings of the IEEE International Conference on Cloud Engineering
https://www.readbyqxmd.com/read/28873323/survey-of-gene-splicing-algorithms-based-on-reads
#20
Xiuhua Si, Qian Wang, Lei Zhang, Ruo Wu, Jiquan Ma
Gene splicing is the process of assembling a large number of unordered short sequence fragments to the original genome sequence as accurately as possible. Several popular splicing algorithms based on reads are reviewed in this article, including reference genome algorithms and de novo splicing algorithms (Greedy-extension, Overlap-Layout-Consensus graph, De Bruijn graph). We also discuss a new splicing method based on the MapReduce strategy and Hadoop. By comparing these algorithms, some conclusions are drawn and some suggestions on gene splicing research are made...
November 2, 2017: Bioengineered
keyword
keyword
4199
1
2
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read
×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"