Unpacking the Gap Box Against Data-Free Knowledge Distillation.

Yang Wang, Biao Qian, Haipeng Liu, Yong Rui, Meng Wang

IEEE Transactions on Pattern Analysis and Machine Intelligence 2024 March 21

Data-free knowledge distillation (DFKD) improves the student model (S) by mimicking the class probability from a pre-trained teacher model (T) without training data. Under such setting, an ideal scenario is that T can help generate "good" samples from a generator (G) to maximally benefit S. However, existing arts suffer from the non-ideal generated samples under the disturbance of the gap (i.e., either too large or small) between the class probabilities of T and S; for example, the generated samples with too large gap may exhibit excessive information for S, while too small gap leads to the limited knowledge in the samples, resulting into the poor generalization. Meanwhile, they fail to judge the "goodness" of the generated samples for S since the fixed T is not necessarily ideal. In this paper, we aim to answer what is inside the gap box; together with how to yield "good" generated samples for DFKD? To this end, we propose a Gap-Sensitive Sample Generation (GapSSG) approach, by revisiting the empirical distilled risk from a data-free perspective, which confirms the existence of an ideal teacher (T * ), while theoretically implying: (1) the gap disturbance originates from the mismatch between T and T * , hence the class probabilities of T enable the approximation to those of T * ; and (2) "good" samples should maximally benefit S via T's class probabilities, owing to unknown T * . To this end, we unpack the gap box between T and S as two findings: inherent gap to perceive T and T * ; derived gap to monitor S and T * . Benefiting from the derived gap that focuses on the adaptability of generated sample to S, we attempt to track student's training route (a series of training epochs) to capture the category distribution of S; upon which, a regulatory factor is further devised to approximate T * over inherent gap, so as to generate "good" samples to S. Furthermore, during the distillation process, a sample-balanced strategy comes up to tackle the overfitting and missing knowledge issues between the generated partial and critical samples by training G. The theoretical and empirical studies verify the advantages of GapSSG over the state-of-the-arts. Our code is available at https://github.com/hfutqian/GapSSG.

Full text links

We have located links that may give you full text access.

Show additional links to paperHide additional links to paper

PubMed

Add to Saved Papers

Get 1-tap access

Related Resources

For the best experience, use the Read mobile app

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app