SWAN: SGD with Normalization and Whitening Enables Stateless LLM Training

Jan 1, 2025ยท
C Ma
Wenbo Gong
Wenbo Gong
,
M Scetbon
,
E Meeds
ยท 0 min read
Type
Publication
ICML 2025
Wenbo Gong
Authors
Senior Researcher
Senior Researcher at Microsoft Research Cambridge working on learning dynamics and optimization for foundation models, with prior work on causality and approximate inference.