Read by QxMD icon Read

KDD: Proceedings

Himabindu Lakkaraju, Stephen H Bach, Leskovec Jure
One of the most important obstacles to deploying predictive models is the fact that humans do not understand and trust them. Knowing which variables are important in a model's prediction and how they are combined can be very powerful in helping people understand and trust automatic decision making systems. Here we propose interpretable decision sets, a framework for building predictive models that are highly accurate, yet also highly interpretable. Decision sets are sets of independent if-then rules. Because each rule can be applied independently, decision sets are simple, concise, and easily interpretable...
August 2016: KDD: Proceedings
Aditya Grover, Jure Leskovec
Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks. Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks...
August 2016: KDD: Proceedings
Qingqi Yue, Ao Yuan, Xuan Che, Minh Huynh, Chunxiao Zhou
The Office of Disability Adjudication and Review (ODAR) is responsible for holding hearings, issuing decisions, and reviewing appeals as part of the Social Security Administration's disability determining process. In order to control and process cases, the ODAR has established a Case Processing and Management System (CPMS) to record management information since December 2003. The CPMS provides a detailed case status history for each case. Due to the large number of appeal requests and limited resources, the number of pending claims at ODAR was over one million cases by March 31, 2015...
August 2016: KDD: Proceedings
Erich Kummerfeld, Joseph Ramsey
Many scientific research programs aim to learn the causal structure of real world phenomena. This learning problem is made more difficult when the target of study cannot be directly observed. One strategy commonly used by social scientists is to create measurable "indicator" variables that covary with the latent variables of interest. Before leveraging the indicator variables to learn about the latent variables, however, one needs a measurement model of the causal relations between the indicators and their corresponding latents...
2016: KDD: Proceedings
David Hallac, Jure Leskovec, Stephen Boyd
Convex optimization is an essential tool for modern data analysis, as it provides a framework to formulate and solve many problems in machine learning and data mining. However, general convex optimization solvers do not scale well, and scalable solvers are often specialized to only work on a narrow class of problems. Therefore, there is a need for simple, scalable algorithms that can solve many common optimization problems. In this paper, we introduce the network lasso, a generalization of the group lasso to a network setting that allows for simultaneous clustering and optimization on graphs...
August 2015: KDD: Proceedings
Honglei Zhuang, Aditya Parameswaran, Dan Roth, Jiawei Han
Crowdsourcing is the de-facto standard for gathering annotated data. While, in theory, data annotation tasks are assumed to be attempted by workers independently, in practice, data annotation tasks are often grouped into batches to be presented and annotated by workers together, in order to save on the time or cost overhead of providing instructions or necessary background. Thus, even though independence is usually assumed between annotations on data items within the same batch, in most cases, a worker's judgment on a data item can still be affected by other data items within the batch, leading to additional errors in collected labels...
August 2015: KDD: Proceedings
Younghoon Kim, Jiawei Han, Cangzhou Yuan
With the increasing use of GPS-enabled mobile phones, geo-tagging, which refers to adding GPS information to media such as micro-blogging messages or photos, has seen a surge in popularity recently. This enables us to not only browse information based on locations, but also discover patterns in the location-based behaviors of users. Many techniques have been developed to find the patterns of people's movements using GPS data, but latent topics in text messages posted with local contexts have not been utilized effectively...
August 2015: KDD: Proceedings
Xiang Ren, Ahmed El-Kishky, Chi Wang, Jiawei Han
In today's computerized and information-based society, we are soaked with vast amounts of text data, ranging from news articles, scientific publications, product reviews, to a wide range of textual information from social media. To unlock the value of these unstructured text data from various domains, it is of great importance to gain an understanding of entities and their relationships. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in massive, domain-specific text corpora...
August 2015: KDD: Proceedings
Shi Zhi, Bo Zhao, Wenzhu Tong, Jing Gao, Dian Yu, Heng Ji, Jiawei Han
When integrating information from multiple sources, it is common to encounter conflicting answers to the same question. Truth discovery is to infer the most accurate and complete integrated answers from conflicting sources. In some cases, there exist questions for which the true answers are excluded from the candidate answers provided by all sources. Without any prior knowledge, these questions, named no-truth questions, are difficult to be distinguished from the questions that have true answers, named has-truth questions...
August 2015: KDD: Proceedings
Chao Zhang, Yu Zheng, Xiuli Ma, Jiawei Han
Recent years have witnessed the wide proliferation of geo-sensory applications wherein a bundle of sensors are deployed at different locations to cooperatively monitor the target condition. Given massive geo-sensory data, we study the problem of mining spatial co-evolving patterns (SCPs), i.e., groups of sensors that are spatially correlated and co-evolve frequently in their readings. SCP mining is of great importance to various real-world applications, yet it is challenging because (1) the truly interesting evolutions are often flooded by numerous trivial fluctuations in the geo-sensory time series; and (2) the pattern search space is extremely large due to the spatiotemporal combinatorial nature of SCP...
August 2015: KDD: Proceedings
Chi Wang, Xueqing Liu, Yanglei Song, Jiawei Han
Automatic construction of user-desired topical hierarchies over large volumes of text data is a highly desirable but challenging task. This study proposes to give users freedom to construct topical hierarchies via interactive operations such as expanding a branch and merging several branches. Existing hierarchical topic modeling techniques are inadequate for this purpose because (1) they cannot consistently preserve the topics when the hierarchy structure is modified; and (2) the slow inference prevents swift response to user requests...
August 2015: KDD: Proceedings
Chenguang Wang, Yangqiu Song, Ahmed El-Kishky, Dan Roth, Ming Zhang, Jiawei Han
One of the key obstacles in making learning protocols realistic in applications is the need to supervise them, a costly process that often requires hiring domain experts. We consider the framework to use the world knowledge as indirect supervision. World knowledge is general-purpose knowledge, which is not designed for any specific domain. Then the key challenges are how to adapt the world knowledge to domains and how to represent it for learning. In this paper, we provide an example of using world knowledge for domain dependent document clustering...
August 2015: KDD: Proceedings
Xiang Ren, Ahmed El-Kishky, Chi Wang, Fangbo Tao, Clare R Voss, Heng Ji, Jiawei Han
Entity recognition is an important but challenging research problem. In reality, many text collections are from specific, dynamic, or emerging domains, which poses significant new challenges for entity recognition with increase in name ambiguity and context sparsity, requiring entity detection without domain restriction. In this paper, we investigate entity recognition (ER) with distant-supervision and propose a novel relation phrase-based ER framework, called ClusType, that runs data-driven phrase mining to generate entity mention candidates and relation phrases, and enforces the principle that relation phrases should be softly clustered when propagating type information between their argument entities...
August 2015: KDD: Proceedings
Yaliang Li, Qi Li, Jing Gao, Lu Su, Bo Zhao, Wei Fan, Jiawei Han
In the era of big data, information regarding the same objects can be collected from increasingly more sources. Unfortunately, there usually exist conflicts among the information coming from different sources. To tackle this challenge, truth discovery, i.e., to integrate multi-source noisy information by estimating the reliability of each source, has emerged as a hot topic. In many real world applications, however, the information may come sequentially, and as a consequence, the truth of objects as well as the reliability of sources may be dynamically evolving...
August 2015: KDD: Proceedings
Marzyeh Ghassemi, Tristan Naumann, Finale Doshi-Velez, Nicole Brimmer, Rohit Joshi, Anna Rumshisky, Peter Szolovits
Accurate knowledge of a patient's disease state and trajectory is critical in a clinical setting. Modern electronic healthcare records contain an increasingly large amount of data, and the ability to automatically identify the factors that influence patient outcomes stand to greatly improve the efficiency and quality of care. We examined the use of latent variable models (viz. Latent Dirichlet Allocation) to decompose free-text hospital notes into meaningful features, and the predictive power of these features for patient mortality...
August 24, 2014: KDD: Proceedings
Aaron Johnson, Vitaly Shmatikov
Genome-wide association studies (GWAS) have become a popular method for analyzing sets of DNA sequences in order to discover the genetic basis of disease. Unfortunately, statistics published as the result of GWAS can be used to identify individuals participating in the study. To prevent privacy breaches, even previously published results have been removed from public databases, impeding researchers' access to the data and hindering collaborative research. Existing techniques for privacy-preserving GWAS focus on answering specific questions, such as correlations between a given pair of SNPs (DNA sequence variations)...
August 2013: KDD: Proceedings
Pinghua Gong, Jieping Ye, Changshui Zhang
Multi-task learning (MTL) aims to improve the performance of multiple related tasks by exploiting the intrinsic relationships among them. Recently, multi-task feature learning algorithms have received increasing attention and they have been successfully applied to many applications involving high-dimensional data. However, they assume that all tasks share a common set of features, which is too restrictive and may not hold in real-world applications, since outlier tasks often exist. In this paper, we propose a Robust MultiTask Feature Learning algorithm (rMTFL) which simultaneously captures a common set of features among relevant tasks and identifies outlier tasks...
August 12, 2012: KDD: Proceedings
Iyad Batal, Dmitriy Fradkin, James Harrison, Fabian Moerchen, Milos Hauskrecht
Improving the performance of classifiers using pattern mining techniques has been an active topic of data mining research. In this work we introduce the recent temporal pattern mining framework for finding predictive patterns for monitoring and event detection problems in complex multivariate time series data. This framework first converts time series into time-interval sequences of temporal abstractions. It then constructs more complex temporal patterns backwards in time using temporal operators. We apply our framework to health care data of 13,558 diabetic patients and show its benefits by efficiently finding useful patterns for detecting and diagnosing adverse medical conditions that are associated with diabetes...
2012: KDD: Proceedings
Jiayu Zhou, Jun Liu, Vaibhav A Narayan, Jieping Ye
Alzheimer's Disease (AD) is the most common neurodegenerative disorder associated with aging. Understanding how the disease progresses and identifying related pathological biomarkers for the progression is of primary importance in the clinical diagnosis and prognosis of Alzheimer's disease. In this paper, we develop novel multi-task learning techniques to predict the disease progression measured by cognitive scores and select biomarkers predictive of the progression. In multi-task learning, the prediction of cognitive scores at each time point is considered as a task, and multiple prediction tasks at different time points are performed simultaneously to capture the temporal smoothness of the prediction models across different time points...
2012: KDD: Proceedings
Rita Chattopadhyay, Zheng Wang, Wei Fan, Ian Davidson, Sethuraman Panchanathan, Jieping Ye
Active Learning is a machine learning and data mining technique that selects the most informative samples for labeling and uses them as training data; it is especially useful when there are large amount of unlabeled data and labeling them is expensive. Recently, batch-mode active learning, where a set of samples are selected concurrently for labeling, based on their collective merit, has attracted a lot of attention. The objective of batch-mode active learning is to select a set of informative samples so that a classifier learned on these samples has good generalization performance on the unlabeled data...
2012: KDD: Proceedings
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"