Read by QxMD icon Read

Big Data

Emilio Carrizosa, Vanesa Guerrero, Daniel Hardt, Dolores Romero Morales
In this article we develop a novel online framework to visualize news data over a time horizon. First, we perform a Natural Language Processing analysis, wherein the words are extracted, and their attributes, namely the importance and the relatedness, are calculated. Second, we present a Mathematical Optimization model for the visualization problem and a numerical optimization approach. The model represents the words using circles, the time-varying area of which displays the importance of the words in each time period...
June 2018: Big Data
Pablo Basanta-Val, Luis Sánchez-Fernández
The proliferation of new data sources, stemmed from the adoption of open-data schemes, in combination with an increasing computing capacity causes the inception of new type of analytics that process Internet of things with low-cost engines to speed up data processing using parallel computing. In this context, the article presents an initiative, called BIG-Boletín Oficial del Estado (BOE), designed to process the Spanish official government gazette (BOE) with state-of-the-art processing engines, to reduce computation time and to offer additional speed up for big data analysts...
June 2018: Big Data
Wilfried Lemahieu, Seppe Vanden Broucke, Bart Baesens
No abstract text is available yet for this article.
June 2018: Big Data
Yadigar Imamverdiyev, Fargana Abdullayeva
In this article, the application of the deep learning method based on Gaussian-Bernoulli type restricted Boltzmann machine (RBM) to the detection of denial of service (DoS) attacks is considered. To increase the DoS attack detection accuracy, seven additional layers are added between the visible and the hidden layers of the RBM. Accurate results in DoS attack detection are obtained by optimization of the hyperparameters of the proposed deep RBM model. The form of the RBM that allows application of the continuous data is used...
June 2018: Big Data
Andrej Duh, Marjan Slak Rupnik, Dean Korošak
Computational propaganda deploys social or political bots to try to shape, steer, and manipulate online public discussions and influence decisions. Collective behavior of populations of social bots has not been yet widely studied, although understanding of collective patterns arising from interactions between bots would aid social bot detection. In this study, we show that there are significant differences in collective behavior between population of bots and population of humans as detected from their Twitter activity...
June 2018: Big Data
Vaibhav Pandey, Poonam Saini
MapReduce (MR) computing paradigm and its open source implementation Hadoop have become a de facto standard to process big data in a distributed environment. Initially, the Hadoop system was homogeneous in three significant aspects, namely, user, workload, and cluster (hardware). However, with growing variety of MR jobs and inclusion of different configurations of nodes in the existing cluster, heterogeneity has become an essential part of Hadoop systems. The heterogeneity factors adversely affect the performance of a Hadoop scheduler and limit the overall throughput of the system...
June 2018: Big Data
Zoran Obradovic
No abstract text is available yet for this article.
June 2018: Big Data
Varol Onur Kayhan, Alison Watkins
This article proposes a novel approach, called data snapshots, to generate real-time probabilities of winning for National Basketball Association (NBA) teams while games are being played. The approach takes a snapshot from a live game, identifies historical games that have the same snapshot, and uses the outcomes of these games to calculate the winning probabilities of the teams in this game as the game is underway. Using data obtained from 20 seasons worth of NBA games, we build three models and compare their accuracies to a baseline accuracy...
June 2018: Big Data
Saurabh Nagrecha, Reid A Johnson, Nitesh V Chawla
Nonstandard insurers suffer from a peculiar variant of fraud wherein an overwhelming majority of claims have the semblance of fraud. We show that state-of-the-art fraud detection performs poorly when deployed at underwriting. Our proposed framework "FraudBuster" represents a new paradigm in predicting segments of fraud at underwriting in an interpretable and regulation compliant manner. We show that the most actionable and generalizable profile of fraud is represented by market segments with high confidence of fraud and high loss ratio...
March 2018: Big Data
Floris Devriendt, Darie Moldovan, Wouter Verbeke
Prescriptive analytics extends on predictive analytics by allowing to estimate an outcome in function of control variables, allowing as such to establish the required level of control variables for realizing a desired outcome. Uplift modeling is at the heart of prescriptive analytics and aims at estimating the net difference in an outcome resulting from a specific action or treatment that is applied. In this article, a structured and detailed literature survey on uplift modeling is provided by identifying and contrasting various groups of approaches...
March 2018: Big Data
Choo-Yee Ting, Chiung Ching Ho, Hui Jia Yee, Wan Razali Matsah
Studies have shown that certain features from geography, demography, trade area, and environment can play a vital role in retail site selection, largely due to the impact they asserted on retail performance. Although the relevant features could be elicited by domain experts, determining the optimal feature set can be intractable and labor-intensive exercise. The challenges center around (1) how to determine features that are important to a particular retail business and (2) how to estimate retail sales performance given a new location? The challenges become apparent when the features vary across time...
March 2018: Big Data
Bart Baesens, Wouter Verbeke, Cristián Bravo
No abstract text is available yet for this article.
March 2018: Big Data
María Óskarsdóttir, Bart Baesens, Jan Vanthienen
The goal of customer retention campaigns, by design, is to add value and enhance the operational efficiency of businesses. For organizations that strive to retain their customers in saturated, and sometimes fast moving, markets such as the telecommunication and banking industries, implementing customer churn prediction models that perform well and in accordance with the business goals is vital. The expected maximum profit (EMP) measure is tailored toward this problem by taking into account the costs and benefits of a retention campaign and estimating its worth for the organization...
March 2018: Big Data
Steve Huckle, Martin White
In this article, we introduce a prototype of an innovative technology for proving the origins of captured digital media. In an era of fake news, when someone shows us a video or picture of some event, how can we trust its authenticity? It seems that the public no longer believe that traditional media is a reliable reference of fact, perhaps due, in part, to the onset of many diverse sources of conflicting information, via social media. Indeed, the issue of "fake" reached a crescendo during the 2016 U...
December 2017: Big Data
Denis Stukal, Sergey Sanovich, Richard Bonneau, Joshua A Tucker
Automated and semiautomated Twitter accounts, bots, have recently gained significant public attention due to their potential interference in the political realm. In this study, we develop a methodology for detecting bots on Twitter using an ensemble of classifiers and apply it to study bot activity within political discussions in the Russian Twittersphere. We focus on the interval from February 2014 to December 2015, an especially consequential period in Russian politics. Among accounts actively Tweeting about Russian politics, we find that on the majority of days, the proportion of Tweets produced by bots exceeds 50%...
December 2017: Big Data
Gillian Bolsover, Philip Howard
No abstract text is available yet for this article.
December 2017: Big Data
Aastha Nigam, Henry K Dambanemuya, Madhav Joshi, Nitesh V Chawla
Peace processes are complex, protracted, and contentious involving significant bargaining and compromising among various societal and political stakeholders. In civil war terminations, it is pertinent to measure the pulse of the nation to ensure that the peace process is responsive to citizens' concerns. Social media yields tremendous power as a tool for dialogue, debate, organization, and mobilization, thereby adding more complexity to the peace process. Using Colombia's final peace agreement and national referendum as a case study, we investigate the influence of two important indicators: intergroup polarization and public sentiment toward the peace process...
December 2017: Big Data
Christian Grimme, Mike Preuss, Lena Adam, Heike Trautmann
Social bots are currently regarded an influential but also somewhat mysterious factor in public discourse and opinion making. They are considered to be capable of massively distributing propaganda in social and online media, and their application is even suspected to be partly responsible for recent election results. Astonishingly, the term social bot is not well defined and different scientific disciplines use divergent definitions. This work starts with a balanced definition attempt, before providing an overview of how social bots actually work (taking the example of Twitter) and what their current technical limitations are...
December 2017: Big Data
David Sathiaraj, William M Cassidy, Eric Rohli
The problem of accurately predicting vote counts in elections is considered in this article. Typically, small-sample polls are used to estimate or predict election outcomes. In this study, a machine-learning hybrid approach is proposed. This approach utilizes multiple sets of static data sources, such as voter registration data, and dynamic data sources, such as polls and donor data, to develop individualized voter scores for each member of the population. These voter scores are used to estimate expected vote counts under different turnout scenarios...
December 2017: Big Data
Vasant Dhar
No abstract text is available yet for this article.
December 2017: Big Data
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"