Most recent papers in the journal KDD : Proceedings

#1

JOURNAL ARTICLE

Graph-Aware Language Model Pre-Training on a Large Graph Corpus Can Help Multiple Graph Applications.

Han Xie, Vassilis N Ioannidis, Carl Yang, Da Zheng, Xiang Song, Yi Xu, Jun Ma, Qing Ping, Belinda Zeng, Houyu Zhang, Sheng Wang, Trishul Chilimbi

Model pre-training on large text corpora has been demonstrated effective for various downstream applications in the NLP domain. In the graph mining domain, a similar analogy can be drawn for pre-training graph models on large graphs in the hope of benefiting downstream graph applications, which has also been explored by several recent studies. However, no existing study has ever investigated the pre-training of text plus graph models on large heterogeneous graphs with abundant textual information (a.k.a. large graph corpora) and then fine-tuning the model on different related downstream applications with different graph schemas...

38375450

August 2023: KDD: Proceedings

#2

JOURNAL ARTICLE

R-Mixup: Riemannian Mixup for Biological Networks.

Xuan Kan, Zimu Li, Hejie Cui, Yue Yu, Ran Xu, Shaojun Yu, Zilong Zhang, Ying Guo, Carl Yang

Biological networks are commonly used in biomedical and healthcare domains to effectively model the structure of complex biological systems with interactions linking biological entities. However, due to their characteristics of high dimensionality and low sample size, directly applying deep learning models on biological networks usually faces severe overfitting. In this work, we propose R-Mixup, a Mixup-based data augmentation technique that suits the symmetric positive definite (SPD) property of adjacency matrices from biological networks with optimized training efficiency...

38343707

August 2023: KDD: Proceedings

#3

JOURNAL ARTICLE

When to Pre-Train Graph Neural Networks? From Data Generation Perspective!

Yuxuan Cao, Jiarong Xu, Carl Yang, Jiaan Wang, Yunchao Zhang, Chunping Wang, Lei Chen, Yang Yang

In recent years, graph pre-training has gained significant attention, focusing on acquiring transferable knowledge from unlabeled graph data to improve downstream performance. Despite these recent endeavors, the problem of negative transfer remains a major concern when utilizing graph pre-trained models to downstream tasks. Previous studies made great efforts on the issue of what to pre-train and how to pre-train by designing a variety of graph pre-training and fine-tuning strategies. However, there are cases where even the most advanced "pre-train and fine-tune" paradigms fail to yield distinct benefits...

38333106

August 2023: KDD: Proceedings

#4

JOURNAL ARTICLE

MolSearch: Search-based Multi-objective Molecular Generation and Property Optimization.

Mengying Sun, Huijun Wang, Jing Xing, Bin Chen, Han Meng, Jiayu Zhou

Leveraging computational methods to generate small molecules with desired properties has been an active research area in the drug discovery field. Towards real-world applications, however, efficient generation of molecules that satisfy multiple property requirements simultaneously remains a key challenge. In this paper, we tackle this challenge using a search-based approach and propose a simple yet effective framework called MolSearch for multi-objective molecular generation (optimization). We show that given proper design and sufficient information, search-based methods can achieve performance comparable or even better than deep learning methods while being computationally efficient...

37056719

August 2022: KDD: Proceedings

#5

JOURNAL ARTICLE

Predicting Age-Related Macular Degeneration Progression with Contrastive Attention and Time-Aware LSTM.

Changchang Yin, Sayoko E Moroi, Ping Zhang

Age-related macular degeneration (AMD) is the leading cause of irreversible blindness in developed countries. Identifying patients at high risk of progression to late AMD, the sight-threatening stage, is critical for clinical actions, including medical interventions and timely monitoring. Recently, deep-learning-based models have been developed and achieved superior performance for late AMD prediction. However, most existing methods are limited to the color fundus photography (CFP) from the last ophthalmic visit and do not include the longitudinal CFP history and AMD progression during the previous years' visits...

36158613

August 2022: KDD: Proceedings

#6

JOURNAL ARTICLE

Deconfounding Actor-Critic Network with Policy Adaptation for Dynamic Treatment Regimes.

Changchang Yin, Ruoqi Liu, Jeffrey Caterino, Ping Zhang

Despite intense efforts in basic and clinical research, an individualized ventilation strategy for critically ill patients remains a major challenge. Recently, dynamic treatment regime (DTR) with reinforcement learning (RL) on electronic health records (EHR) has attracted interest from both the healthcare industry and machine learning research community. However, most learned DTR policies might be biased due to the existence of confounders. Although some treatment actions non-survivors received may be helpful, if confounders cause the mortality, the training of RL models guided by long-term outcomes (e...

36101663

August 2022: KDD: Proceedings

#7

JOURNAL ARTICLE

Federated Adversarial Debiasing for Fair and Transferable Representations.

Junyuan Hong, Zhuangdi Zhu, Shuyang Yu, Zhangyang Wang, Hiroko Dodge, Jiayu Zhou

Federated learning is a distributed learning framework that is communication efficient and provides protection over participating users' raw training data. One outstanding challenge of federate learning comes from the users' heterogeneity, and learning from such data may yield biased and unfair models for minority groups. While adversarial learning is commonly used in centralized learning for mitigating bias, there are significant barriers when extending it to the federated framework. In this work, we study these barriers and address them by proposing a novel approach Federated Adversarial DEbiasing (FADE)...

35571559

August 2021: KDD: Proceedings

#8

JOURNAL ARTICLE

MoCL: Data-driven Molecular Fingerprint via Knowledge-aware Contrastive Learning from Molecular Graph.

Mengying Sun, Jing Xing, Huijun Wang, Bin Chen, Jiayu Zhou

Recent years have seen a rapid growth of utilizing graph neural networks (GNNs) in the biomedical domain for tackling drug-related problems. However, like any other deep architectures, GNNs are data hungry. While requiring labels in real world is often expensive, pretraining GNNs in an unsupervised manner has been actively explored. Among them, graph contrastive learning, by maximizing the mutual information between paired graph augmentations, has been shown to be effective on various downstream tasks. However, the current graph contrastive learning framework has two limitations...

35571558

August 2021: KDD: Proceedings

#9

JOURNAL ARTICLE

LogPar: Logistic PARAFAC2 Factorization for Temporal Binary Data with Missing Values.

Kejing Yin, Ardavan Afshar, Joyce C Ho, William K Cheung, Chao Zhang, Jimeng Sun

Binary data with one-class missing values are ubiquitous in real-world applications. They can be represented by irregular tensors with varying sizes in one dimension, where value one means presence of a feature while zero means unknown (i.e., either presence or absence of a feature). Learning accurate low-rank approximations from such binary irregular tensors is a challenging task. However, none of the existing models developed for factorizing irregular tensors take the missing values into account, and they assume Gaussian distributions, resulting in a distribution mismatch when applied to binary data...

34109054

August 2020: KDD: Proceedings

#10

JOURNAL ARTICLE

MetaPred: Meta-Learning for Clinical Risk Prediction with Limited Patient Electronic Health Records.

Xi Sheryl Zhang, Fengyi Tang, Hiroko H Dodge, Jiayu Zhou, Fei Wang

In recent years, large amounts of health data, such as patient Electronic Health Records (EHR), are becoming readily available. This provides an unprecedented opportunity for knowledge discovery and data mining algorithms to dig insights from them, which can, later on, be helpful to the improvement of the quality of care delivery. Predictive modeling of clinical risks, including in-hospital mortality, hospital readmission, chronic disease onset, condition exacerbation, etc., from patient EHR, is one of the health data analytic problems that attract lots of the interests...

33859865

August 2019: KDD: Proceedings

#11

JOURNAL ARTICLE

Naranjo Question Answering using End-to-End Multi-task Learning Model.

Bhanu Pratap Singh Rawat, Fei Li, Hong Yu

In the clinical domain, it is important to understand whether an adverse drug reaction (ADR) is caused by a particular medication. Clinical judgement studies help judge the causal relation between a medication and its ADRs. In this study, we present the first attempt to automatically infer the causality between a drug and an ADR from electronic health records (EHRs) by answering the Naranjo questionnaire, the validated clinical question answering set used by domain experts for ADR causality assessment. Using physicians' annotation as the gold standard, our proposed joint model, which uses multi-task learning to predict the answers of a subset of the Naranjo questionnaire, significantly outperforms the baseline pipeline model with a good margin, achieving a macro-weighted f-score between 0...

31799022

August 2019: KDD: Proceedings

#12

JOURNAL ARTICLE

Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks.

Srijan Kumar, Xikun Zhang, Jure Leskovec

Modeling sequential interactions between users and items/products is crucial in domains such as e-commerce, social networking, and education. Representation learning presents an attractive opportunity to model the dynamic evolution of users and items, where each user/item can be embedded in a Euclidean space and its evolution can be modeled by an embedding trajectory in this space. However, existing dynamic embedding methods generate embeddings only when users take actions and do not explicitly model the future trajectory of the user/item in the embedding space...

31538030

August 2019: KDD: Proceedings

#13

JOURNAL ARTICLE

Retaining Privileged Information for Multi-Task Learning.

Fengyi Tang, Cao Xiao, Fei Wang, Jiayu Zhou, Li-Wei H Lehman

Knowledge transfer has been of great interest in current machine learning research, as many have speculated its importance in modeling the human ability to rapidly generalize learned models to new scenarios. Particularly in cases where training samples are limited, knowledge transfer shows improvement on both the learning speed and generalization performance of related tasks. Recently, Learning Using Privileged Information (LUPI) has presented a new direction in knowledge transfer by modeling the transfer of prior knowledge as a Teacher-Student interaction process...

34796042

July 2019: KDD: Proceedings

#14

JOURNAL ARTICLE

A Free Energy Based Approach for Distance Metric Learning.

Sho Inaba, Carl T Fakhry, Rahul V Kulkarni, Kourosh Zarringhalam

We present a reformulation of the distance metric learning problem as a penalized optimization problem, with a penalty term corresponding to the von Neumann entropy of the distance metric. This formulation leads to a mapping to statistical mechanics such that the metric learning optimization problem becomes equivalent to free energy minimization. Correspondingly, our approach leads to an analytical solution of the optimization problem based on the Boltzmann distribution. The mapping established in this work suggests new approaches for dimensionality reduction and provides insights into determination of optimal parameters for the penalty term...

33708457

July 2019: KDD: Proceedings

#15

JOURNAL ARTICLE

Interpretable Representation Learning for Healthcare via Capturing Disease Progression through Time.

Tian Bai, Brian L Egleston, Shanshan Zhang, Slobodan Vucetic

Various deep learning models have recently been applied to predictive modeling of Electronic Health Records (EHR). In medical claims data, which is a particular type of EHR data, each patient is represented as a sequence of temporally ordered irregularly sampled visits to health providers, where each visit is recorded as an unordered set of medical codes specifying patient's diagnosis and treatment provided during the visit. Based on the observation that different patient conditions have different temporal progression patterns, in this paper we propose a novel interpretable deep learning model, called Timeline...

31037221

August 2018: KDD: Proceedings

#16

JOURNAL ARTICLE

Voxel Deconvolutional Networks for 3D Brain Image Labeling.

Yongjun Chen, Min Shi, Hongyang Gao, Dinggang Shen, Lei Cai, Shuiwang Ji

Deep learning methods have shown great success in pixel-wise prediction tasks. One of the most popular methods employs an encoder-decoder network in which deconvolutional layers are used for up-sampling feature maps. However, a key limitation of the deconvolutional layer is that it suffers from the checkerboard artifact problem, which harms the prediction accuracy. This is caused by the independency among adjacent pixels on the output feature maps. Previous work only solved the checkerboard artifact issue of deconvolutional layers in the 2D space...

30906620

August 2018: KDD: Proceedings

#17

JOURNAL ARTICLE

Generalized Score Functions for Causal Discovery.

Biwei Huang, Kun Zhang, Yizhu Lin, Bernhard Schölkopf, Clark Glymour

Discovery of causal relationships from observational data is a fundamental problem. Roughly speaking, there are two types of methods for causal discovery, constraint-based ones and score-based ones. Score-based methods avoid the multiple testing problem and enjoy certain advantages compared to constraint-based ones. However, most of them need strong assumptions on the functional forms of causal mechanisms, as well as on data distributions, which limit their applicability. In practice the precise information of the underlying model class is usually unknown...

30191079

August 2018: KDD: Proceedings

#18

JOURNAL ARTICLE

SUSTain: Scalable Unsupervised Scoring for Tensors and its Application to Phenotyping.

Ioakeim Perros, Evangelos E Papalexakis, Haesun Park, Richard Vuduc, Xiaowei Yan, Christopher Defilippi, Walter F Stewart, Jimeng Sun

This paper presents a new method, which we call SUSTain, that extends real-valued matrix and tensor factorizations to data where values are integers. Such data are common when the values correspond to event counts or ordinal measures. The conventional approach is to treat integer data as real, and then apply real-valued factorizations. However, doing so fails to preserve important characteristics of the original data, thereby making it hard to interpret the results. Instead, our approach extracts factor values from integer datasets as scores that are constrained to take values from a small integer set...

33680534

July 2018: KDD: Proceedings

#19

JOURNAL ARTICLE

GRAM: Graph-based Attention Model for Healthcare Representation Learning.

Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F Stewart, Jimeng Sun

Deep learning methods exhibit promising performance for predictive modeling in healthcare, but two important challenges remain: Data insufficiency: Often in healthcare predictive modeling, the sample size is insufficient for deep learning methods to achieve satisfactory results. Interpretation: The representations learned by deep learning methods should align with medical knowledge. To address these challenges, we propose GRaph-based Attention Model (GRAM) that supplements electronic health records (EHR) with hierarchical information inherent to medical ontologies...

33717639

August 2017: KDD: Proceedings

#20

JOURNAL ARTICLE

A Data-driven Process Recommender Framework.

Sen Yang, Xin Dong, Leilei Sun, Yichen Zhou, Richard A Farneth, Hui Xiong, Randall S Burd, Ivan Marsic

We present an approach for improving the performance of complex knowledge-based processes by providing data-driven step-by-step recommendations. Our framework uses the associations between similar historic process performances and contextual information to determine the prototypical way of enacting the process. We introduce a novel similarity metric for grouping traces into clusters that incorporates temporal information about activity performance and handles concurrent activities. Our data-driven recommender system selects the appropriate prototype performance of the process based on user-provided context attributes...

30430038

August 2017: KDD: Proceedings

Use the journals feature with a free QxMD account.

KDD : Proceedings

Save your favorite articles in one place with a free QxMD account.

Read

Search Tips