pre-training

Here are 3 public repositories matching this topic...

winstonsmith1897 / DantinoX

DantinoX: A modular, memory-efficient Transformer implementation in JAX/Flax NNX. Includes Sparse MoE, GQA, Sliding Window Attention, Gradient Accumulation and Checkpointing

flax attention-mechanism fine jax mixture-of-experts pre-training transformer-architecture llm

Updated May 5, 2026
HTML

AmanPriyanshu / Stratified-LLM-Subsets-100K-1M-Scale

Sponsor

Star

Stratified LLM Subsets delivers diverse training data at 100K-1M scales across pre-training (FineWeb-Edu, Proof-Pile-2), instruction-following (Tulu-3, Orca AgentInstruct), and reasoning distillation (Llama-Nemotron). Embedding-based k-means clustering ensures maximum diversity across 5 high-quality open datasets.

Updated Oct 4, 2025
HTML

Yusheno / vidar

Star

🎥 Discover Vidar, a unified embodied video foundation model designed for low-resource environments, enhancing video understanding and generation.

react editor chat data-science privacy database backend leveldb sms react-redux ecs data-visualization message lsm-tree flutter-app pre-training world-model

Updated May 11, 2026
HTML

Improve this page

Add a description, image, and links to the pre-training topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pre-training topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly