Gradient Multi-Normalization for Stateless and Scalable LLM TrainingJan 1, 2025ยทM Scetbon,C MaWenbo Gong,E Meedsยท 0 min read PDF Cite Source DocumentTypeConference paperPublicationNeurIPS 2025Last updated on Jan 1, 2025Optimization LLM Learning Dynamics AuthorsWenbo GongSenior ResearcherSenior Researcher at Microsoft Research Cambridge working on learning dynamics and optimization for foundation models, with prior work on causality and approximate inference. SWAN: SGD with Normalization and Whitening Enables Stateless LLM Training Jan 1, 2025 →