Learning Dynamics

Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension

Jan 1, 2025

SWAN: SGD with Normalization and Whitening Enables Stateless LLM Training

Jan 1, 2025

Gradient Multi-Normalization for Stateless and Scalable LLM Training

Jan 1, 2025