keyword
MENU ▼
Read by QxMD icon Read
search

Hadoop

keyword
https://www.readbyqxmd.com/read/29887668/a-data-colocation-grid-framework-for-big-data-medical-image-processing-backend-design
#1
Shunxing Bao, Yuankai Huo, Prasanna Parvathaneni, Andrew J Plassard, Camilo Bermudez, Yuang Yao, Ilwoo Lyu, Aniruddha Gokhale, Bennett A Landman
When processing large medical imaging studies, adopting high performance grid computing resources rapidly becomes important. We recently presented a "medical image processing-as-a-service" grid framework that offers promise in utilizing the Apache Hadoop ecosystem and HBase for data colocation by moving computation close to medical image storage. However, the framework has not yet proven to be easy to use in a heterogeneous hardware environment. Furthermore, the system has not yet validated when considering variety of multi-level analysis in medical imaging...
March 2018: Proceedings of SPIE
https://www.readbyqxmd.com/read/29877450/demonstration-of-application-driven-network-slicing-and-orchestration-in-optical-packet-domains-on-demand-vdc-expansion-for-hadoop-mapreduce-optimization
#2
Bingxin Kong, Siqi Liu, Jie Yin, Shengru Li, Zuqing Zhu
Nowadays, it is common for service providers (SPs) to leverage hybrid clouds to improve the quality-of-service (QoS) of their Big Data applications. However, for achieving guaranteed latency and/or bandwidth in its hybrid cloud, an SP might desire to have a virtual datacenter (vDC) network, in which it can manage and manipulate the network connections freely. To address this requirement, we design and implement a network slicing and orchestration (NSO) system that can create and expand vDCs across optical/packet domains on-demand...
May 28, 2018: Optics Express
https://www.readbyqxmd.com/read/29861711/implementing-a-parallel-image-edge-detection-algorithm-based-on-the-otsu-canny-operator-on-the-hadoop-platform
#3
Jianfang Cao, Lichao Chen, Min Wang, Yun Tian
The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data...
2018: Computational Intelligence and Neuroscience
https://www.readbyqxmd.com/read/29796018/framework-for-parallel-preprocessing-of-microarray-data-using-hadoop
#4
Amirhossein Sahlabadi, Ravie Chandren Muniyandi, Mahdi Sahlabadi, Hossein Golshanbafghy
Nowadays, microarray technology has become one of the popular ways to study gene expression and diagnosis of disease. National Center for Biology Information (NCBI) hosts public databases containing large volumes of biological data required to be preprocessed, since they carry high levels of noise and bias. Robust Multiarray Average (RMA) is one of the standard and popular methods that is utilized to preprocess the data and remove the noises. Most of the preprocessing algorithms are time-consuming and not able to handle a large number of datasets with thousands of experiments...
2018: Advances in Bioinformatics
https://www.readbyqxmd.com/read/29762754/optimized-distributed-systems-achieve-significant-performance-improvement-on-sorted-merging-of-massive-vcf-files
#5
Xiaobo Sun, Jingjing Gao, Peng Jin, Celeste Eng, Esteban G Burchard, Terri H Beaty, Ingo Ruczinski, Rasika A Mathias, Kathleen C Barnes, Fusheng Wang, Zhaohui Qin
Background: Sorted merging of genomic data is a common data operation necessary in many sequencing-based studies. It involves sorting and merging genomic data from different subjects by their genomic locations. In particular, merging a large number of Variant Call Format (VCF) files is frequently required in large scale whole genome sequencing or whole exome sequencing projects. Traditional single machine based methods become increasingly inefficient when processing large numbers of VCF files due to the excessive computation time and I/O bottleneck...
May 11, 2018: GigaScience
https://www.readbyqxmd.com/read/29751580/distributed-fast-self-organized-maps-for-massive-spectrophotometric-data-analysis-%C3%A2
#6
Carlos Dafonte, Daniel Garabato, Marco A Álvarez, Minia Manteiga
Analyzing huge amounts of data becomes essential in the era of Big Data, where databases are populated with hundreds of Gigabytes that must be processed to extract knowledge. Hence, classical algorithms must be adapted towards distributed computing methodologies that leverage the underlying computational power of these platforms. Here, a parallel, scalable, and optimized design for self-organized maps (SOM) is proposed in order to analyze massive data gathered by the spectrophotometric sensor of the European Space Agency (ESA) Gaia spacecraft, although it could be extrapolated to other domains...
May 3, 2018: Sensors
https://www.readbyqxmd.com/read/29673604/concurrence-of-big-data-analytics-and-healthcare-a-systematic-review
#7
REVIEW
Nishita Mehta, Anil Pandit
BACKGROUND: The application of Big Data analytics in healthcare has immense potential for improving the quality of care, reducing waste and error, and reducing the cost of care. PURPOSE: This systematic review of literature aims to determine the scope of Big Data analytics in healthcare including its applications and challenges in its adoption in healthcare. It also intends to identify the strategies to overcome the challenges. DATA SOURCES: A systematic search of the articles was carried out on five major scientific databases: ScienceDirect, PubMed, Emerald, IEEE Xplore and Taylor & Francis...
June 2018: International Journal of Medical Informatics
https://www.readbyqxmd.com/read/29649172/hadoop-oriented-smart-cities-architecture
#8
Vlad Diaconita, Ana-Ramona Bologa, Razvan Bologa
A smart city implies a consistent use of technology for the benefit of the community. As the city develops over time, components and subsystems such as smart grids, smart water management, smart traffic and transportation systems, smart waste management systems, smart security systems, or e-governance are added. These components ingest and generate a multitude of structured, semi-structured or unstructured data that may be processed using a variety of algorithms in batches, micro batches or in real-time. The ICT architecture must be able to handle the increased storage and processing needs...
April 12, 2018: Sensors
https://www.readbyqxmd.com/read/29607412/experiences-with-the-twitter-health-surveillance-ths-system
#9
Manuel Rodríguez-Martínez
Social media has become an important platform to gauge public opinion on topics related to our daily lives. In practice, processing these posts requires big data analytics tools since the volume of data and the speed of production overwhelm single-server solutions. Building an application to capture and analyze posts from social media can be a challenge simply because it requires combining a set of complex software tools that often times are tricky to configure, tune, and maintain. In many instances, the application ends up being an assorted collection of Java/Scala programs or Python scripts that developers cobble together to generate the data products they need...
June 2017: Proceedings. IEEE International Congress on Big Data
https://www.readbyqxmd.com/read/29600663/-traditional-chinese-medicine-data-management-policy-in-big-data-environment
#10
Yang Liang, Chang-Song Ding, Xin-di Huang, Le Deng
As traditional data management model cannot effectively manage the massive data in traditional Chinese medicine(TCM) due to the uncertainty of data object attributes as well as the diversity and abstraction of data representation, a management strategy for TCM data based on big data technology is proposed. Based on true characteristics of TCM data, this strategy could solve the problems of the uncertainty of data object attributes in TCM information and the non-uniformity of the data representation by using modeless properties of stored objects in big data technology...
February 2018: Zhongguo Zhong Yao za Zhi, Zhongguo Zhongyao Zazhi, China Journal of Chinese Materia Medica
https://www.readbyqxmd.com/read/29596506/correction-chaos-based-simultaneous-compression-and-encryption-for-hadoop
#11
Muhammad Usama, Nordin Zakaria
[This corrects the article DOI: 10.1371/journal.pone.0168207.].
2018: PloS One
https://www.readbyqxmd.com/read/29460090/medical-big-data-warehouse-architecture-and-system-design-a-case-study-improving-healthcare-resources-distribution
#12
Abderrazak Sebaa, Fatima Chikh, Amina Nouicer, AbdelKamel Tari
The huge increases in medical devices and clinical applications which generate enormous data have raised a big issue in managing, processing, and mining this massive amount of data. Indeed, traditional data warehousing frameworks can not be effective when managing the volume, variety, and velocity of current medical applications. As a result, several data warehouses face many issues over medical data and many challenges need to be addressed. New solutions have emerged and Hadoop is one of the best examples, it can be used to process these streams of medical data...
February 19, 2018: Journal of Medical Systems
https://www.readbyqxmd.com/read/29346410/an-evaluation-of-multi-probe-locality-sensitive-hashing-for-computing-similarities-over-web-scale-query-logs
#13
Graham Cormode, Anirban Dasgupta, Amit Goyal, Chi Hoon Lee
Many modern applications of AI such as web search, mobile browsing, image processing, and natural language processing rely on finding similar items from a large database of complex objects. Due to the very large scale of data involved (e.g., users' queries from commercial search engines), computing such near or nearest neighbors is a non-trivial task, as the computational cost grows significantly with the number of items. To address this challenge, we adopt Locality Sensitive Hashing (a.k.a, LSH) methods and evaluate four variants in a distributed computing environment (specifically, Hadoop)...
2018: PloS One
https://www.readbyqxmd.com/read/29342232/informational-and-linguistic-analysis-of-large-genomic-sequence-collections-via-efficient-hadoop-cluster-algorithms
#14
Umberto Ferraro Petrillo, Gianluca Roscigno, Giuseppe Cattaneo, Raffaele Giancarlo
Motivation: Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e., how many times each k-mer in {A;C; G; T}k occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing...
January 12, 2018: Bioinformatics
https://www.readbyqxmd.com/read/29320579/emotional-modelling-and-classification-of-a-large-scale-collection-of-scene-images-in-a-cluster-environment
#15
Jianfang Cao, Yanfei Li, Yun Tian
The development of network technology and the popularization of image capturing devices have led to a rapid increase in the number of digital images available, and it is becoming increasingly difficult to identify a desired image from among the massive number of possible images. Images usually contain rich semantic information, and people usually understand images at a high semantic level. Therefore, achieving the ability to use advanced technology to identify the emotional semantics contained in images to enable emotional semantic image classification remains an urgent issue in various industries...
2018: PloS One
https://www.readbyqxmd.com/read/29297337/reconstructing-evolutionary-trees-in-parallel-for-massive-sequences
#16
Quan Zou, Shixiang Wan, Xiangxiang Zeng, Zhanshan Sam Ma
BACKGROUND: Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel...
December 14, 2017: BMC Systems Biology
https://www.readbyqxmd.com/read/29295690/hadoop-mcc-efficient-multiple-compound-comparison-algorithm-using-hadoop
#17
Guan-Jie Hua, Che-Lun Hung, Chuan Yi Tang
In this paper, we propose a novel heterogeneous high performance computing method, named as Hadoop-MCC, integrating Hadoop and GPU, to compare huge amount of chemical structures efficiently. The proposed method gains the high availability and fault tolerance from Hadoop, as Hadoop is used to scatter input data to GPU devices and gather the results from GPU devices. A comparison of LINGO is performed on each GPU device in parallel. According to the experimental results, the proposed method on multiple GPU devices can achieve better computational performance than the CUDA-MCC on a single GPU device...
January 2, 2018: Combinatorial Chemistry & High Throughput Screening
https://www.readbyqxmd.com/read/29194413/an-interface-for-biomedical-big-data-processing-on-the-tianhe-2-supercomputer
#18
Xi Yang, Chengkun Wu, Kai Lu, Lin Fang, Yong Zhang, Shengkang Li, Guixin Guo, YunFei Du
Big data, cloud computing, and high-performance computing (HPC) are at the verge of convergence. Cloud computing is already playing an active part in big data processing with the help of big data frameworks like Hadoop and Spark. The recent upsurge of high-performance computing in China provides extra possibilities and capacity to address the challenges associated with big data. In this paper, we propose Orion-a big data interface on the Tianhe-2 supercomputer-to enable big data applications to run on Tianhe-2 via a single command or a shell script...
December 1, 2017: Molecules: a Journal of Synthetic Chemistry and Natural Product Chemistry
https://www.readbyqxmd.com/read/29185792/metres-an-efficient-database-for-genomic-applications
#19
Jordi Vilaplana, Rui Alves, Francesc Solsona, Jordi Mateo, Ivan Teixidó, Marc Pifarré
MetReS (Metabolic Reconstruction Server) is a genomic database that is shared between two software applications that address important biological problems. Biblio-MetReS is a data-mining tool that enables the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the processes of interest and their function. The main goal of this work was to identify the areas where the performance of the MetReS database performance could be improved and to test whether this improvement would scale to larger datasets and more complex types of analysis...
February 2018: Journal of Computational Biology: a Journal of Computational Molecular Cell Biology
https://www.readbyqxmd.com/read/29178837/vispa2-a-scalable-pipeline-for-high-throughput-identification-and-annotation-of-vector-integration-sites
#20
Giulio Spinozzi, Andrea Calabria, Stefano Brasca, Stefano Beretta, Ivan Merelli, Luciano Milanesi, Eugenio Montini
BACKGROUND: Bioinformatics tools designed to identify lentiviral or retroviral vector insertion sites in the genome of host cells are used to address the safety and long-term efficacy of hematopoietic stem cell gene therapy applications and to study the clonal dynamics of hematopoietic reconstitution. The increasing number of gene therapy clinical trials combined with the increasing amount of Next Generation Sequencing data, aimed at identifying integration sites, require both highly accurate and efficient computational software able to correctly process "big data" in a reasonable computational time...
November 25, 2017: BMC Bioinformatics
keyword
keyword
4199
1
2
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read
×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"