Paper-Conference

SWAN: SGD with Normalization and Whitening Enables Stateless LLM Training

Jan 1, 2025

Gradient Multi-Normalization for Stateless and Scalable LLM Training

Jan 1, 2025

ProxyTune: Hyperparameter tuning through iteratively refined proxies

Jan 1, 2024

Neural structure learning with stochastic differential equations

Jan 1, 2024

Rhino: Deep causal temporal relationship learning with history-dependent noise

Jan 1, 2023

Bayesdag: Gradient-based posterior inference for causal discovery

Jan 1, 2023

Simultaneous missing value imputation and structure learning with groups

Jan 1, 2022

Sliced kernelized Stein discrepancy

Jan 1, 2021

Interpreting diffusion score matching using normalizing flow

Jan 1, 2021

Active slices for sliced Stein discrepancy

Jan 1, 2021