journal
https://read.qxmd.com/read/38375450/graph-aware-language-model-pre-training-on-a-large-graph-corpus-can-help-multiple-graph-applications
#1
JOURNAL ARTICLE
Han Xie, Vassilis N Ioannidis, Carl Yang, Da Zheng, Xiang Song, Yi Xu, Jun Ma, Qing Ping, Belinda Zeng, Houyu Zhang, Sheng Wang, Trishul Chilimbi
Model pre-training on large text corpora has been demonstrated effective for various downstream applications in the NLP domain. In the graph mining domain, a similar analogy can be drawn for pre-training graph models on large graphs in the hope of benefiting downstream graph applications, which has also been explored by several recent studies. However, no existing study has ever investigated the pre-training of text plus graph models on large heterogeneous graphs with abundant textual information (a.k.a. large graph corpora) and then fine-tuning the model on different related downstream applications with different graph schemas...
August 2023: KDD: Proceedings
https://read.qxmd.com/read/38343707/r-mixup-riemannian-mixup-for-biological-networks
#2
JOURNAL ARTICLE
Xuan Kan, Zimu Li, Hejie Cui, Yue Yu, Ran Xu, Shaojun Yu, Zilong Zhang, Ying Guo, Carl Yang
Biological networks are commonly used in biomedical and healthcare domains to effectively model the structure of complex biological systems with interactions linking biological entities. However, due to their characteristics of high dimensionality and low sample size, directly applying deep learning models on biological networks usually faces severe overfitting. In this work, we propose R-Mixup, a Mixup-based data augmentation technique that suits the symmetric positive definite (SPD) property of adjacency matrices from biological networks with optimized training efficiency...
August 2023: KDD: Proceedings
https://read.qxmd.com/read/38333106/when-to-pre-train-graph-neural-networks-from-data-generation-perspective
#3
JOURNAL ARTICLE
Yuxuan Cao, Jiarong Xu, Carl Yang, Jiaan Wang, Yunchao Zhang, Chunping Wang, Lei Chen, Yang Yang
In recent years, graph pre-training has gained significant attention, focusing on acquiring transferable knowledge from unlabeled graph data to improve downstream performance. Despite these recent endeavors, the problem of negative transfer remains a major concern when utilizing graph pre-trained models to downstream tasks. Previous studies made great efforts on the issue of what to pre-train and how to pre-train by designing a variety of graph pre-training and fine-tuning strategies. However, there are cases where even the most advanced "pre-train and fine-tune" paradigms fail to yield distinct benefits...
August 2023: KDD: Proceedings
https://read.qxmd.com/read/37056719/molsearch-search-based-multi-objective-molecular-generation-and-property-optimization
#4
JOURNAL ARTICLE
Mengying Sun, Huijun Wang, Jing Xing, Bin Chen, Han Meng, Jiayu Zhou
Leveraging computational methods to generate small molecules with desired properties has been an active research area in the drug discovery field. Towards real-world applications, however, efficient generation of molecules that satisfy multiple property requirements simultaneously remains a key challenge. In this paper, we tackle this challenge using a search-based approach and propose a simple yet effective framework called MolSearch for multi-objective molecular generation (optimization). We show that given proper design and sufficient information, search-based methods can achieve performance comparable or even better than deep learning methods while being computationally efficient...
August 2022: KDD: Proceedings
https://read.qxmd.com/read/36158613/predicting-age-related-macular-degeneration-progression-with-contrastive-attention-and-time-aware-lstm
#5
JOURNAL ARTICLE
Changchang Yin, Sayoko E Moroi, Ping Zhang
Age-related macular degeneration (AMD) is the leading cause of irreversible blindness in developed countries. Identifying patients at high risk of progression to late AMD, the sight-threatening stage, is critical for clinical actions, including medical interventions and timely monitoring. Recently, deep-learning-based models have been developed and achieved superior performance for late AMD prediction. However, most existing methods are limited to the color fundus photography (CFP) from the last ophthalmic visit and do not include the longitudinal CFP history and AMD progression during the previous years' visits...
August 2022: KDD: Proceedings
https://read.qxmd.com/read/36101663/deconfounding-actor-critic-network-with-policy-adaptation-for-dynamic-treatment-regimes
#6
JOURNAL ARTICLE
Changchang Yin, Ruoqi Liu, Jeffrey Caterino, Ping Zhang
Despite intense efforts in basic and clinical research, an individualized ventilation strategy for critically ill patients remains a major challenge. Recently, dynamic treatment regime (DTR) with reinforcement learning (RL) on electronic health records (EHR) has attracted interest from both the healthcare industry and machine learning research community. However, most learned DTR policies might be biased due to the existence of confounders. Although some treatment actions non-survivors received may be helpful, if confounders cause the mortality, the training of RL models guided by long-term outcomes (e...
August 2022: KDD: Proceedings
https://read.qxmd.com/read/35571559/federated-adversarial-debiasing-for-fair-and-transferable-representations
#7
JOURNAL ARTICLE
Junyuan Hong, Zhuangdi Zhu, Shuyang Yu, Zhangyang Wang, Hiroko Dodge, Jiayu Zhou
Federated learning is a distributed learning framework that is communication efficient and provides protection over participating users' raw training data. One outstanding challenge of federate learning comes from the users' heterogeneity, and learning from such data may yield biased and unfair models for minority groups. While adversarial learning is commonly used in centralized learning for mitigating bias, there are significant barriers when extending it to the federated framework. In this work, we study these barriers and address them by proposing a novel approach Federated Adversarial DEbiasing (FADE)...
August 2021: KDD: Proceedings
https://read.qxmd.com/read/35571558/mocl-data-driven-molecular-fingerprint-via-knowledge-aware-contrastive-learning-from-molecular-graph
#8
JOURNAL ARTICLE
Mengying Sun, Jing Xing, Huijun Wang, Bin Chen, Jiayu Zhou
Recent years have seen a rapid growth of utilizing graph neural networks (GNNs) in the biomedical domain for tackling drug-related problems. However, like any other deep architectures, GNNs are data hungry. While requiring labels in real world is often expensive, pretraining GNNs in an unsupervised manner has been actively explored. Among them, graph contrastive learning, by maximizing the mutual information between paired graph augmentations, has been shown to be effective on various downstream tasks. However, the current graph contrastive learning framework has two limitations...
August 2021: KDD: Proceedings
https://read.qxmd.com/read/34109054/logpar-logistic-parafac2-factorization-for-temporal-binary-data-with-missing-values
#9
JOURNAL ARTICLE
Kejing Yin, Ardavan Afshar, Joyce C Ho, William K Cheung, Chao Zhang, Jimeng Sun
Binary data with one-class missing values are ubiquitous in real-world applications. They can be represented by irregular tensors with varying sizes in one dimension, where value one means presence of a feature while zero means unknown (i.e., either presence or absence of a feature). Learning accurate low-rank approximations from such binary irregular tensors is a challenging task. However, none of the existing models developed for factorizing irregular tensors take the missing values into account, and they assume Gaussian distributions, resulting in a distribution mismatch when applied to binary data...
August 2020: KDD: Proceedings
https://read.qxmd.com/read/33859865/metapred-meta-learning-for-clinical-risk-prediction-with-limited-patient-electronic-health-records
#10
JOURNAL ARTICLE
Xi Sheryl Zhang, Fengyi Tang, Hiroko H Dodge, Jiayu Zhou, Fei Wang
In recent years, large amounts of health data, such as patient Electronic Health Records (EHR), are becoming readily available. This provides an unprecedented opportunity for knowledge discovery and data mining algorithms to dig insights from them, which can, later on, be helpful to the improvement of the quality of care delivery. Predictive modeling of clinical risks, including in-hospital mortality, hospital readmission, chronic disease onset, condition exacerbation, etc., from patient EHR, is one of the health data analytic problems that attract lots of the interests...
August 2019: KDD: Proceedings
https://read.qxmd.com/read/31799022/naranjo-question-answering-using-end-to-end-multi-task-learning-model
#11
JOURNAL ARTICLE
Bhanu Pratap Singh Rawat, Fei Li, Hong Yu
In the clinical domain, it is important to understand whether an adverse drug reaction (ADR) is caused by a particular medication. Clinical judgement studies help judge the causal relation between a medication and its ADRs. In this study, we present the first attempt to automatically infer the causality between a drug and an ADR from electronic health records (EHRs) by answering the Naranjo questionnaire, the validated clinical question answering set used by domain experts for ADR causality assessment. Using physicians' annotation as the gold standard, our proposed joint model, which uses multi-task learning to predict the answers of a subset of the Naranjo questionnaire, significantly outperforms the baseline pipeline model with a good margin, achieving a macro-weighted f-score between 0...
August 2019: KDD: Proceedings
https://read.qxmd.com/read/31538030/predicting-dynamic-embedding-trajectory-in-temporal-interaction-networks
#12
JOURNAL ARTICLE
Srijan Kumar, Xikun Zhang, Jure Leskovec
Modeling sequential interactions between users and items/products is crucial in domains such as e-commerce, social networking, and education. Representation learning presents an attractive opportunity to model the dynamic evolution of users and items, where each user/item can be embedded in a Euclidean space and its evolution can be modeled by an embedding trajectory in this space. However, existing dynamic embedding methods generate embeddings only when users take actions and do not explicitly model the future trajectory of the user/item in the embedding space...
August 2019: KDD: Proceedings
https://read.qxmd.com/read/34796042/retaining-privileged-information-for-multi-task-learning
#13
JOURNAL ARTICLE
Fengyi Tang, Cao Xiao, Fei Wang, Jiayu Zhou, Li-Wei H Lehman
Knowledge transfer has been of great interest in current machine learning research, as many have speculated its importance in modeling the human ability to rapidly generalize learned models to new scenarios. Particularly in cases where training samples are limited, knowledge transfer shows improvement on both the learning speed and generalization performance of related tasks. Recently, Learning Using Privileged Information (LUPI) has presented a new direction in knowledge transfer by modeling the transfer of prior knowledge as a Teacher-Student interaction process...
July 2019: KDD: Proceedings
https://read.qxmd.com/read/33708457/a-free-energy-based-approach-for-distance-metric-learning
#14
JOURNAL ARTICLE
Sho Inaba, Carl T Fakhry, Rahul V Kulkarni, Kourosh Zarringhalam
We present a reformulation of the distance metric learning problem as a penalized optimization problem, with a penalty term corresponding to the von Neumann entropy of the distance metric. This formulation leads to a mapping to statistical mechanics such that the metric learning optimization problem becomes equivalent to free energy minimization. Correspondingly, our approach leads to an analytical solution of the optimization problem based on the Boltzmann distribution. The mapping established in this work suggests new approaches for dimensionality reduction and provides insights into determination of optimal parameters for the penalty term...
July 2019: KDD: Proceedings
https://read.qxmd.com/read/31037221/interpretable-representation-learning-for-healthcare-via-capturing-disease-progression-through-time
#15
JOURNAL ARTICLE
Tian Bai, Brian L Egleston, Shanshan Zhang, Slobodan Vucetic
Various deep learning models have recently been applied to predictive modeling of Electronic Health Records (EHR). In medical claims data, which is a particular type of EHR data, each patient is represented as a sequence of temporally ordered irregularly sampled visits to health providers, where each visit is recorded as an unordered set of medical codes specifying patient's diagnosis and treatment provided during the visit. Based on the observation that different patient conditions have different temporal progression patterns, in this paper we propose a novel interpretable deep learning model, called Timeline...
August 2018: KDD: Proceedings
https://read.qxmd.com/read/30906620/voxel-deconvolutional-networks-for-3d-brain-image-labeling
#16
JOURNAL ARTICLE
Yongjun Chen, Min Shi, Hongyang Gao, Dinggang Shen, Lei Cai, Shuiwang Ji
Deep learning methods have shown great success in pixel-wise prediction tasks. One of the most popular methods employs an encoder-decoder network in which deconvolutional layers are used for up-sampling feature maps. However, a key limitation of the deconvolutional layer is that it suffers from the checkerboard artifact problem, which harms the prediction accuracy. This is caused by the independency among adjacent pixels on the output feature maps. Previous work only solved the checkerboard artifact issue of deconvolutional layers in the 2D space...
August 2018: KDD: Proceedings
https://read.qxmd.com/read/30191079/generalized-score-functions-for-causal-discovery
#17
JOURNAL ARTICLE
Biwei Huang, Kun Zhang, Yizhu Lin, Bernhard Schölkopf, Clark Glymour
Discovery of causal relationships from observational data is a fundamental problem. Roughly speaking, there are two types of methods for causal discovery, constraint-based ones and score-based ones. Score-based methods avoid the multiple testing problem and enjoy certain advantages compared to constraint-based ones. However, most of them need strong assumptions on the functional forms of causal mechanisms, as well as on data distributions, which limit their applicability. In practice the precise information of the underlying model class is usually unknown...
August 2018: KDD: Proceedings
https://read.qxmd.com/read/33680534/sustain-scalable-unsupervised-scoring-for-tensors-and-its-application-to-phenotyping
#18
JOURNAL ARTICLE
Ioakeim Perros, Evangelos E Papalexakis, Haesun Park, Richard Vuduc, Xiaowei Yan, Christopher Defilippi, Walter F Stewart, Jimeng Sun
This paper presents a new method, which we call SUSTain, that extends real-valued matrix and tensor factorizations to data where values are integers. Such data are common when the values correspond to event counts or ordinal measures. The conventional approach is to treat integer data as real, and then apply real-valued factorizations. However, doing so fails to preserve important characteristics of the original data, thereby making it hard to interpret the results. Instead, our approach extracts factor values from integer datasets as scores that are constrained to take values from a small integer set...
July 2018: KDD: Proceedings
https://read.qxmd.com/read/33717639/gram-graph-based-attention-model-for-healthcare-representation-learning
#19
JOURNAL ARTICLE
Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F Stewart, Jimeng Sun
Deep learning methods exhibit promising performance for predictive modeling in healthcare, but two important challenges remain: Data insufficiency: Often in healthcare predictive modeling, the sample size is insufficient for deep learning methods to achieve satisfactory results. Interpretation: The representations learned by deep learning methods should align with medical knowledge. To address these challenges, we propose GRaph-based Attention Model (GRAM) that supplements electronic health records (EHR) with hierarchical information inherent to medical ontologies...
August 2017: KDD: Proceedings
https://read.qxmd.com/read/30430038/a-data-driven-process-recommender-framework
#20
JOURNAL ARTICLE
Sen Yang, Xin Dong, Leilei Sun, Yichen Zhou, Richard A Farneth, Hui Xiong, Randall S Burd, Ivan Marsic
We present an approach for improving the performance of complex knowledge-based processes by providing data-driven step-by-step recommendations. Our framework uses the associations between similar historic process performances and contextual information to determine the prototypical way of enacting the process. We introduce a novel similarity metric for grouping traces into clusters that incorporates temporal information about activity performance and handles concurrent activities. Our data-driven recommender system selects the appropriate prototype performance of the process based on user-provided context attributes...
August 2017: KDD: Proceedings
journal
journal
47001
1
2
Fetch more papers »
Fetching more papers... Fetching...
Remove bar
Read by QxMD icon Read
×

Save your favorite articles in one place with a free QxMD account.

×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"

We want to hear from doctors like you!

Take a second to answer a survey question.