SWAN: SGD with Normalization and Whitening Enables Stateless LLM TrainingJan 1, 2025ยทC MaWenbo Gong,M Scetbon,E Meedsยท 0 min read PDF Cite Source DocumentTypeConference paperPublicationICML 2025Last updated on Jan 1, 2025Optimization LLM Learning Dynamics AuthorsWenbo GongSenior ResearcherSenior Researcher at Microsoft Research Cambridge working on learning dynamics and optimization for foundation models, with prior work on causality and approximate inference. ← Gradient Multi-Normalization for Stateless and Scalable LLM Training Jan 1, 2025Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension Jan 1, 2025 →