journal
MENU ▼
Read by QxMD icon Read
search

KDD: Proceedings

journal
https://www.readbyqxmd.com/read/29780658/the-selective-labels-problem-evaluating-algorithmic-predictions-in-the-presence-of-unobservables
#1
Himabindu Lakkaraju, Jon Kleinberg, Jure Leskovec, Jens Ludwig, Sendhil Mullainathan
Evaluating whether machines improve on human performance is one of the central questions of machine learning. However, there are many domains where the data is selectively labeled in the sense that the observed outcomes are themselves a consequence of the existing choices of the human decision-makers. For instance, in the context of judicial bail decisions, we observe the outcome of whether a defendant fails to return for their court appearance only if the human judge decides to release the defendant on bail...
August 2017: KDD: Proceedings
https://www.readbyqxmd.com/read/29770258/local-higher-order-graph-clustering
#2
Hao Yin, Austin R Benson, Jure Leskovec, David F Gleich
Local graph clustering methods aim to find a cluster of nodes by exploring a small region of the graph. These methods are attractive because they enable targeted clustering around a given seed node and are faster than traditional global graph clustering methods because their runtime does not depend on the size of the input graph. However, current local graph partitioning methods are not designed to account for the higher-order structures crucial to the network, nor can they effectively handle directed networks...
August 2017: KDD: Proceedings
https://www.readbyqxmd.com/read/29770257/toeplitz-inverse-covariance-based-clustering-of-multivariate-time-series-data
#3
David Hallac, Sagar Vare, Stephen Boyd, Jure Leskovec
Subsequence clustering of multivariate time series is a useful tool for discovering repeated patterns in temporal data. Once these patterns have been discovered, seemingly complicated datasets can be interpreted as a temporal sequence of only a small number of states, or clusters . For example, raw sensor data from a fitness-tracking application can be expressed as a timeline of a select few actions ( i.e. , walking, sitting, running). However, discovering these patterns is challenging because it requires simultaneous segmentation and clustering of the time series...
August 2017: KDD: Proceedings
https://www.readbyqxmd.com/read/29770256/network-inference-via-the-time-varying-graphical-lasso
#4
David Hallac, Youngsuk Park, Stephen Boyd, Jure Leskovec
Many important problems can be modeled as a system of interconnected entities, where each entity is recording time-dependent observations or measurements. In order to spot trends, detect anomalies, and interpret the temporal dynamics of such data, it is essential to understand the relationships between the different entities and how these relationships evolve over time. In this paper, we introduce the time-varying graphical lasso (TVGL) , a method of inferring time-varying networks from raw time series data...
August 2017: KDD: Proceedings
https://www.readbyqxmd.com/read/29755826/pharmacovigilance-via-baseline-regularization-with-large-scale-longitudinal-observational-data
#5
Zhaobin Kuang, Peggy Peissig, Vítor Santos Costa, Richard Maclin, David Page
Several prominent public health hazards [29] that occurred at the beginning of this century due to adverse drug events (ADEs) have raised international awareness of governments and industries about pharmacovigilance (PhV) [6,7], the science and activities to monitor and prevent adverse events caused by pharmaceutical products after they are introduced to the market. A major data source for PhV is large-scale longitudinal observational databases (LODs) [6] such as electronic health records (EHRs) and medical insurance claim databases...
August 2017: KDD: Proceedings
https://www.readbyqxmd.com/read/29430330/moliere-automatic-biomedical-hypothesis-generation-system
#6
Justin Sybrandt, Michael Shtutman, Ilya Safro
Hypothesis generation is becoming a crucial time-saving technique which allows biomedical researchers to quickly discover implicit connections between important concepts. Typically, these systems operate on domain-specific fractions of public medical data. MOLIERE, in contrast, utilizes information from over 24.5 million documents. At the heart of our approach lies a multi-modal and multi-relational network of biomedical objects extracted from several heterogeneous datasets from the National Center for Biotechnology Information (NCBI)...
August 2017: KDD: Proceedings
https://www.readbyqxmd.com/read/29333328/learning-tree-structured-detection-cascades-for-heterogeneous-networks-of-embedded-devices
#7
Hamid Dadkhahi, Benjamin M Marlin
In this paper, we present a new approach to learning cascaded classifiers for use in computing environments that involve networks of heterogeneous and resource-constrained, low-power embedded compute and sensing nodes. We present a generalization of the classical linear detection cascade to the case of tree-structured cascades where different branches of the tree execute on different physical compute nodes in the network. Different nodes have access to different features, as well as access to potentially different computation and energy resources...
August 2017: KDD: Proceedings
https://www.readbyqxmd.com/read/29071165/federated-tensor-factorization-for-computational-phenotyping
#8
Yejin Kim, Jimeng Sun, Hwanjo Yu, Xiaoqian Jiang
Tensor factorization models offer an effective approach to convert massive electronic health records into meaningful clinical concepts (phenotypes) for data analysis. These models need a large amount of diverse samples to avoid population bias. An open challenge is how to derive phenotypes jointly across multiple hospitals, in which direct patient-level data sharing is not possible (e.g., due to institutional policies). In this paper, we developed a novel solution to enable federated tensor factorization for computational phenotyping without sharing patient-level data...
August 2017: KDD: Proceedings
https://www.readbyqxmd.com/read/28713636/ranking-causal-anomalies-via-temporal-and-dynamical-analysis-on-vanishing-correlations
#9
Wei Cheng, Kai Zhang, Haifeng Chen, Guofei Jiang, Zhengzhang Chen, Wei Wang
Modern world has witnessed a dramatic increase in our ability to collect, transmit and distribute real-time monitoring and surveillance data from large-scale information systems and cyber-physical systems. Detecting system anomalies thus attracts significant amount of interest in many fields such as security, fault management, and industrial optimization. Recently, invariant network has shown to be a powerful way in characterizing complex system behaviours. In the invariant network, a node represents a system component and an edge indicates a stable, significant interaction between two components...
August 2016: KDD: Proceedings
https://www.readbyqxmd.com/read/28580192/fast-component-pursuit-for-large-scale-inverse-covariance-estimation
#10
Lei Han, Yu Zhang, Tong Zhang
The maximum likelihood estimation (MLE) for the Gaussian graphical model, which is also known as the inverse covariance estimation problem, has gained increasing interest recently. Most existing works assume that inverse covariance estimators contain sparse structure and then construct models with the ℓ1 regularization. In this paper, different from existing works, we study the inverse covariance estimation problem from another perspective by efficiently modeling the low-rank structure in the inverse covariance, which is assumed to be a combination of a low-rank part and a diagonal matrix...
August 2016: KDD: Proceedings
https://www.readbyqxmd.com/read/28392970/generalized-hierarchical-sparse-model-for-arbitrary-order-interactive-antigenic-sites-identification-in-flu-virus-data
#11
Lei Han, Yu Zhang, Xiu-Feng Wan, Tong Zhang
Recent statistical evidence has shown that a regression model by incorporating the interactions among the original covariates/features can significantly improve the interpretability for biological data. One major challenge is the exponentially expanded feature space when adding high-order feature interactions to the model. To tackle the huge dimensionality, hierarchical sparse models (HSM) are developed by enforcing sparsity under heredity structures in the interactions among the covariates. However, existing methods only consider pairwise interactions, making the discovery of important high-order interactions a non-trivial open problem...
August 2016: KDD: Proceedings
https://www.readbyqxmd.com/read/28316874/computational-drug-repositioning-using-continuous-self-controlled-case-series
#12
Zhaobin Kuang, James Thomson, Michael Caldwell, Peggy Peissig, Ron Stewart, David Page
Computational Drug Repositioning (CDR) is the task of discovering potential new indications for existing drugs by mining large-scale heterogeneous drug-related data sources. Leveraging the patient-level temporal ordering information between numeric physiological measurements and various drug prescriptions provided in Electronic Health Records (EHRs), we propose a Continuous Self-controlled Case Series (CSCCS) model for CDR. As an initial evaluation, we look for drugs that can control Fasting Blood Glucose (FBG) level in our experiments...
August 2016: KDD: Proceedings
https://www.readbyqxmd.com/read/28203486/dynamics-of-large-multi-view-social-networks-synergy-cannibalization-and-cross-view-interplay
#13
Yu Shi, Myunghwan Kim, Shaunak Chatterjee, Mitul Tiwari, Souvik Ghosh, Rómer Rosales
Most social networking services support multiple types of relationships between users, such as getting connected, sending messages, and consuming feed updates. These users and relationships can be naturally represented as a dynamic multi-view network, which is a set of weighted graphs with shared common nodes but having their own respective edges. Different network views, representing structural relationship and interaction types, could have very distinctive properties individually and these properties may change due to interplay across views...
August 2016: KDD: Proceedings
https://www.readbyqxmd.com/read/28180028/squish-near-optimal-compression-for-archival-of-relational-datasets
#14
Yihan Gao, Aditya Parameswaran
Relational datasets are being generated at an alarmingly rapid rate across organizations and industries. Compressing these datasets could significantly reduce storage and archival costs. Traditional compression algorithms, e.g., gzip, are suboptimal for compressing relational datasets since they ignore the table structure and relationships between attributes. We study compression algorithms that leverage the relational structure to compress datasets to a much greater extent. We develop Squish, a system that uses a combination of Bayesian Networks and Arithmetic Coding to capture multiple kinds of dependencies among attributes and achieve near-entropy compression rate...
August 2016: KDD: Proceedings
https://www.readbyqxmd.com/read/28163978/gmove-group-level-mobility-modeling-using-geo-tagged-social-media
#15
Chao Zhang, Keyang Zhang, Quan Yuan, Luming Zhang, Tim Hanratty, Jiawei Han
Understanding human mobility is of great importance to various applications, such as urban planning, traffic scheduling, and location prediction. While there has been fruitful research on modeling human mobility using tracking data (e.g., GPS traces), the recent growth of geo-tagged social media (GeoSM) brings new opportunities to this task because of its sheer size and multi-dimensional nature. Nevertheless, how to obtain quality mobility models from the highly sparse and complex GeoSM data remains a challenge that cannot be readily addressed by existing techniques...
August 2016: KDD: Proceedings
https://www.readbyqxmd.com/read/27853627/interpretable-decision-sets-a-joint-framework-for-description-and-prediction
#16
Himabindu Lakkaraju, Stephen H Bach, Leskovec Jure
One of the most important obstacles to deploying predictive models is the fact that humans do not understand and trust them. Knowing which variables are important in a model's prediction and how they are combined can be very powerful in helping people understand and trust automatic decision making systems. Here we propose interpretable decision sets, a framework for building predictive models that are highly accurate, yet also highly interpretable. Decision sets are sets of independent if-then rules. Because each rule can be applied independently, decision sets are simple, concise, and easily interpretable...
August 2016: KDD: Proceedings
https://www.readbyqxmd.com/read/27853626/node2vec-scalable-feature-learning-for-networks
#17
Aditya Grover, Jure Leskovec
Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks. Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks...
August 2016: KDD: Proceedings
https://www.readbyqxmd.com/read/27747132/batch-model-for-batched-timestamps-data-analysis-with-application-to-the-ssa-disability-program
#18
Qingqi Yue, Ao Yuan, Xuan Che, Minh Huynh, Chunxiao Zhou
The Office of Disability Adjudication and Review (ODAR) is responsible for holding hearings, issuing decisions, and reviewing appeals as part of the Social Security Administration's disability determining process. In order to control and process cases, the ODAR has established a Case Processing and Management System (CPMS) to record management information since December 2003. The CPMS provides a detailed case status history for each case. Due to the large number of appeal requests and limited resources, the number of pending claims at ODAR was over one million cases by March 31, 2015...
August 2016: KDD: Proceedings
https://www.readbyqxmd.com/read/27766182/causal-clustering-for-1-factor-measurement-models
#19
Erich Kummerfeld, Joseph Ramsey
Many scientific research programs aim to learn the causal structure of real world phenomena. This learning problem is made more difficult when the target of study cannot be directly observed. One strategy commonly used by social scientists is to create measurable "indicator" variables that covary with the latent variables of interest. Before leveraging the indicator variables to learn about the latent variables, however, one needs a measurement model of the causal relations between the indicators and their corresponding latents...
2016: KDD: Proceedings
https://www.readbyqxmd.com/read/27398260/network-lasso-clustering-and-optimization-in-large-graphs
#20
David Hallac, Jure Leskovec, Stephen Boyd
Convex optimization is an essential tool for modern data analysis, as it provides a framework to formulate and solve many problems in machine learning and data mining. However, general convex optimization solvers do not scale well, and scalable solvers are often specialized to only work on a narrow class of problems. Therefore, there is a need for simple, scalable algorithms that can solve many common optimization problems. In this paper, we introduce the network lasso, a generalization of the group lasso to a network setting that allows for simultaneous clustering and optimization on graphs...
August 2015: KDD: Proceedings
journal
journal
47001
1
2
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read
×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"