Papers with the keyword Hadoop (Page 2)

#21

JOURNAL ARTICLE

A distributed computing model for big data anonymization in the networks.

Farough Ashkouti, Keyhan Khamforoosh

Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals' private information...

37115783

2023: PloS One

#22

REVIEW

Big Data Analytics Using Cloud Computing Based Frameworks for Power Management Systems: Status, Constraints, and Future Recommendations.

Ahmed Hadi Ali Al-Jumaili, Ravie Chandren Muniyandi, Mohammad Kamrul Hasan, Johnny Koh Siaw Paw, Mandeep Jit Singh

Traditional parallel computing for power management systems has prime challenges such as execution time, computational complexity, and efficiency like process time and delays in power system condition monitoring, particularly consumer power consumption, weather data, and power generation for detecting and predicting data mining in the centralized parallel processing and diagnosis. Due to these constraints, data management has become a critical research consideration and bottleneck. To cope with these constraints, cloud computing-based methodologies have been introduced for managing data efficiently in power management systems...

36991663

March 8, 2023: Sensors

#23

JOURNAL ARTICLE

Design and Development of a Big Data Platform for Disease Burden Based on the Spark Engine.

Chengcheng Li, Jing Gao, Qingwei Pan, Zhihua Zhou, Yue Yang, Shangcheng Zhou

OBJECTIVE: This study attempts to build a big data platform for disease burden that can realize the deep coupling of artificial intelligence and public health. This is a highly open and shared intelligent platform, including big data collection, analysis, and result visualization. METHODS: Based on data mining theory and technology, the current situation of multisource data on disease burden was analyzed. Putting forward the disease burden big data management model, functional modules, and technical framework, Kafka technology is used to optimize the transmission efficiency of the underlying data...

36793705

2023: Computational Intelligence and Neuroscience

#24

JOURNAL ARTICLE

Data science technology course: The design, assessment and computing environment perspectives.

Azlan Ismail, Sofianita Mutalib, Haryani Haron

This article discusses the key elements of the Data Science Technology course offered to postgraduate students enrolled in the Master of Data Science program. This course complements the existing curriculum by providing the skills to handle the Big Data platform and tools, in addition to data science activities. We tackle the discussion about this course based on three main requirements, which are related to the need to exploit the key skills from two dimensions, namely, Data Science and Big Data, and the need for a cluster-based computing platform and its accessibility...

36714440

January 24, 2023: Education and Information Technologies

#25

JOURNAL ARTICLE

Prediction and Big Data Impact Analysis of Telecom Churn by Backpropagation Neural Network Algorithm from the Perspective of Business Model.

Jiabing Xu, Jiarui Liu, Tianen Yao, Yang Li

This study aims to transform the existing telecom operators from traditional Internet operators to digital-driven services, and improve the overall competitiveness of telecom enterprises. Data mining is applied to telecom user classification to process the existing telecom user data through data integration, cleaning, standardization, and transformation. Although the existing algorithms ensure the accuracy of the algorithm on the telecom user analysis platform under big data, they do not solve the limitations of single machine computing and cannot effectively improve the training efficiency of the model...

36656558

January 19, 2023: Big Data

#26

JOURNAL ARTICLE

A Distributed Big Data Analytics Architecture for Vehicle Sensor Data.

Theodoros Alexakis, Nikolaos Peppes, Konstantinos Demestichas, Evgenia Adamopoulou

The unceasingly increasing needs for data acquisition, storage and analysis in transportation systems have led to the adoption of new technologies and methods in order to provide efficient and reliable solutions. Both highways and vehicles, nowadays, host a vast variety of sensors collecting different types of highly fluctuating data such as speed, acceleration, direction, and so on. From the vast volume and variety of these data emerges the need for the employment of big data techniques and analytics in the context of state-of-the-art intelligent transportation systems (ITS)...

36616956

December 29, 2022: Sensors

#27

JOURNAL ARTICLE

Disease-specific data processing: An intelligent digital platform for diabetes based on model prediction and data analysis utilizing big data technology.

Xiangyong Kong, Ruiyang Peng, Huajie Dai, Yichi Li, Yanzhuan Lu, Xiaohan Sun, Bozhong Zheng, Yuze Wang, Zhiyun Zhao, Shaolin Liang, Min Xu

BACKGROUND: Artificial intelligence technology has become a mainstream trend in the development of medical informatization. Because of the complex structure and a large amount of medical data generated in the current medical informatization process, big data technology to assist doctors in scientific research and analysis and obtain high-value information has become indispensable for medical and scientific research. METHODS: This study aims to discuss the architecture of diabetes intelligent digital platform by analyzing existing data mining methods and platform building experience in the medical field, using a large data platform building technology utilizing the Hadoop system, model prediction, and data processing analysis methods based on the principles of statistics and machine learning...

36579056

2022: Frontiers in Public Health

#28

JOURNAL ARTICLE

Cloud-native distributed genomic pileup operations.

Marek Wiewiórka, Agnieszka Szmurło, Paweł Stankiewicz, Tomasz Gambin

MOTIVATION: Pileup analysis is a building block of many bioinformatics pipelines, including variant calling and genotyping. This step tends to become a bottleneck of the entire assay since the straightforward pileup implementations involve processing of all base calls from all alignments sequentially. On the other hand, a distributed version of the algorithm faces the intrinsic challenge of splitting reads-oriented file formats into self-contained partitions to avoid costly data exchange between computational nodes...

36515465

December 14, 2022: Bioinformatics

#29

JOURNAL ARTICLE

Large-scale digital forensic investigation for Windows registry on Apache Spark.

Jun-Ha Lee, Hyuk-Yoon Kwon

In this study, we investigate large-scale digital forensic investigation on Apache Spark using a Windows registry. Because the Windows registry depends on the system on which it operates, the existing forensic methods on the Windows registry have been targeted on the Windows registry in a single system. However, it is a critical issue to analyze large-scale registry data collected from several Windows systems because it allows us to detect suspiciously changed data by comparing the Windows registry in multiple systems...

36477435

2022: PloS One

#30

JOURNAL ARTICLE

A survey of data element perspective: Application of artificial intelligence in health big data.

Honglin Xiong, Hongmin Chen, Li Xu, Hong Liu, Lumin Fan, Qifeng Tang, Hsunfang Cho

Artificial intelligence (AI) based on the perspective of data elements is widely used in the healthcare informatics domain. Large amounts of clinical data from electronic medical records (EMRs), electronic health records (EHRs), and electroencephalography records (EEGs) have been generated and collected at an unprecedented speed and scale. For instance, the new generation of wearable technologies enables easy-collecting peoples' daily health data such as blood pressure, blood glucose, and physiological data, as well as the application of EHRs documenting large amounts of patient data...

36389224

2022: Frontiers in Neuroscience

#31

JOURNAL ARTICLE

Towards Developing a Robust Intrusion Detection Model Using Hadoop-Spark and Data Augmentation for IoT Networks.

Ricardo Alejandro Manzano Sanchez, Marzia Zaman, Nishith Goel, Kshirasagar Naik, Rohit Joshi

In recent years, anomaly detection and machine learning for intrusion detection systems have been used to detect anomalies on Internet of Things networks. These systems rely on machine and deep learning to improve the detection accuracy. However, the robustness of the model depends on the number of datasamples available, quality of the data, and the distribution of the data classes. In the present paper, we focused specifically on the amount of data and class imbalanced since both parameters are key in IoT due to the fact that network traffic is increasing exponentially...

36298077

October 12, 2022: Sensors

#32

JOURNAL ARTICLE

FDup: a framework for general-purpose and efficient entity deduplication of record collections.

Michele De Bonis, Paolo Manghi, Claudio Atzori

Deduplication is a technique aiming at identifying and resolving duplicate metadata records in a collection. This article describes FDup (Flat Collections Deduper), a general-purpose software framework supporting a complete deduplication workflow to manage big data record collections: metadata record data model definition, identification of candidate duplicates, identification of duplicates. FDup brings two main innovations: first, it delivers a full deduplication framework in a single easy-to-use software package based on Apache Spark Hadoop framework, where developers can customize the optimal and parallel workflow steps of blocking, sliding windows, and similarity matching function via an intuitive configuration file; second, it introduces a novel approach to improve performance, beyond the known techniques of "blocking" and "sliding window", by introducing a smart similarity matching function T-match...

36262137

2022: PeerJ. Computer Science

#33

JOURNAL ARTICLE

Load Balancing Algorithms for Hadoop Cluster in Unbalanced Environment.

Weiyu Fu, Lixia Wang

Considering that in the process of job scheduling, the cluster load should be prebalanced rather than remedied when the load is seriously unbalanced; therefore, in this paper, the task scheduling flow of the Hadoop cluster is analyzed deeply. On the Hadoop platform, a self-dividing algorithm is proposed for load balancing. An intelligent optimization algorithm is used to solve load balance. A dynamic feedback load balancing scheduling method is proposed from the point of view of task scheduling. In order to solve the shortcoming of the fair scheduling algorithm, this paper proposes two ways to improve the resource utilization and overall performance of Hadoop...

36248928

2022: Computational Intelligence and Neuroscience

#34

JOURNAL ARTICLE

Design of Cross-Platform Information Retrieval System of Library Based on Digital Twins.

Shanshan Shang, Zikai Yu, Kun Jiao, Yingshi Huang, Hua Guo, Guozhong Wang

In order to improve the library's ability of cross-platform information retrieval and data scheduling and distribution, a library cross-platform information retrieval system based on digital twin technology is designed. Using data warehouse decision support and data source structured query methods, the spectral characteristics of Library cross-platform information resources are extracted. Using the method of Hadoop data parallel loading, the library cross-platform operation data is divided into decision-making data, computing resource pool data, and Hadoop parallel loading data...

36203727

2022: Computational Intelligence and Neuroscience

#35

JOURNAL ARTICLE

Analysis of the Correlation between Football Education Environment and Students' Psychology Health Based on Gauss Characteristics.

Shu Qiao, Gaosong Huang

Campus football has become a core content of school physical education. Through football education, we can cultivate students' sound personality and promote students' all-round physical and mental development. At the same time, through psychological skills training methods, we can enrich the educational methods of football skills and provide theoretical reference for promoting educational reform. On the basis of Gaussian features, this paper combines the mixed Gaussian feature model to further describe the relationship between football education and students' psychology...

36200085

2022: Journal of Environmental and Public Health

#36

JOURNAL ARTICLE

An Analysis of the Effects of the English Language and Literature on Students' Language Ability from a Multidimensional Environment.

Weifang Chen

One of the most crucial components of a student's language proficiency is basic language proficiency, which is also its fundamental component. The development of students' language skills is greatly aided by ELL (English Language and Literature). It can not only foster the growth of students' language thinking but also widen their perspectives and enhance their capacity for language comprehension. In this essay, the rules of English are examined from the multifaceted ELL viewpoint. This study extracts personality characteristic data from practical texts and incorporates it into a modelling process of students' knowledge changes based on DM- (data mining-) related technology and multidisciplinary expertise...

36148405

2022: Journal of Environmental and Public Health

#37

JOURNAL ARTICLE

Cloud-Based English Multimedia for Universities Test Questions Modeling and Applications.

Yanping Wu, Changlong Zheng, Lele Xie, Meihui Hao

This study constructs a cloud computing-based college English multimedia test question modeling and application through an in-depth study of cloud computing and college English multimedia test questions. The emergence of cloud computing technology undoubtedly provides a new and ideal method to solve test data and paper management problems. This study analyzes the advantages of the Hadoop computing platform and MapReduce computing model and builds a distributed computing platform based on Hadoop using universities' existing hardware and software resources...

36124116

2022: Computational Intelligence and Neuroscience

#38

JOURNAL ARTICLE

Individual Online Learning Behavior Analysis Based on Hadoop.

Ning Xiang

The online individual behavior analysis is an important means for mining user interests. The user retweeting behavior prediction is typical problem for online individual behavior analysis. In order to make online learning behavior prediction method more suitable for the application of large-scale datasets, the improved condensed K nearest neighbor (ICKNN) method is proposed in this paper. Inspired by the idea of compressing samples in the condensed nearest neighbor (CNN) algorithm, this proposed method has adopted the Hadoop platform to parallelize the traditional CNN algorithm...

36120695

2022: Computational Intelligence and Neuroscience

#39

JOURNAL ARTICLE

Comparative Analysis of Chinese Culture and Hong Kong, Macao, and Taiwan Culture in the Field of Public Health Based on the CNN Model.

Hui Xiong

In view of the defect of a large amount of information on cultural resources and poor recommendation effect on a standalone platform, a cultural recommendation system based on the Hadoop platform was proposed, combined with the convolutional neural network (CNN). It aims to improve the adaptability of Chinese culture and Hong Kong, Macao, and Taiwan culture. Firstly, the CNN is used to encode the collected information deeply and map it to the deep feature space. Secondly, the attention mechanism is used to focus the coded features in the deep feature space to improve the classification ability of features...

36111065

2022: Journal of Environmental and Public Health

#40

JOURNAL ARTICLE

TCM Constitution Analysis Method Based on Parallel FP-Growth Algorithm in Hadoop Framework.

Mingzheng Li, Xiaojuan Lv, Ye Liu, Lin Wang, Jianqiang Song

This work is devoted to establishing a comparatively accurate classification model between symptoms, constitutions, and regimens for traditional Chinese medicine (TCM) constitution analysis to provide preliminary screening and decision support for clinical diagnosis. However, for the analysis of massive distributed medical data in a cloud platform, the traditional data mining methods have the problems of low mining efficiency and large memory consumption, and long tuning time, an association rules method for TCM constitution analysis (ARA-TCM) is proposed that based on FP-growth algorithm and the open-source distributed file system in Hadoop framework (HDFS) to make full use of its powerful parallel processing capability...

36081755

2022: Journal of Healthcare Engineering

Use the keywords feature with a free QxMD account.

Hadoop

Save your favorite articles in one place with a free QxMD account.

Read

Search Tips