Add like
Add dislike
Add to saved papers

Towards Understanding Convergence and Generalization of AdamW.

AdamW modifies Adam by adding a decoupled weight decay to decay network weights per training iteration. For adaptive algorithms, this decoupled weight decay does not affect specific optimization steps, and differs from the widely used l2 -regularizer which changes optimization steps via changing the first- and second-order gradient moments. Despite its great practical success, for AdamW, its convergence behavior and generalization improvement over Adam and l2 -regularized Adam ( l2 -Adam) remain absent yet. To solve this issue, we prove the convergence of AdamW and justify its generalization advantages over Adam and l2 -Adam. Specifically, AdamW provably converges but minimizes a dynamically regularized loss that combines vanilla loss and a dynamical regularization induced by decoupled weight decay, thus yielding different behaviors with Adam and l2 -Adam. Moreover, on both general nonconvex problems and PŁ-conditioned problems, we establish stochastic gradient complexity of AdamW to find a stationary point. Such complexity is also applicable to Adam and l2 -Adam, and improves their previously known complexity, especially for over-parametrized networks. Besides, we prove that AdamW enjoys smaller generalization errors than Adam and l2 -Adam from the Bayesian posterior aspect. This result, for the first time, explicitly reveals the benefits of decoupled weight decay in AdamW. Experimental results validate our theory.

Full text links

We have located links that may give you full text access.
Can't access the paper?
Try logging in through your university/institutional subscription. For a smoother one-click institutional access experience, please use our mobile app.

Related Resources

For the best experience, use the Read mobile app

Mobile app image

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices Toggle icon

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app