Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".
loss adaptive-gradient-clipping loss-spike adaptive-clipping zclip stable-llm-pretraining enable-high-learning-rate traning-stability pre-training-stability stable-training llm-stable-training
-
Updated
Apr 28, 2025 - Python