Torch2Jax-DeepSeek-R1-Distill-Qwen-1.5B

Flax (JAX) implementation of DeepSeek-R1-Distill-Qwen-1.5B with weights ported from Hugging Face.

Overview

This repository provides both Flax (JAX) and PyTorch implementations of the DeepSeek-R1-Distill-Qwen-1.5B model. It includes:

Inference [QUICKSTART]:
- inference.ipynb: Contains a quickstart script to download and convert params from torch to flax, load model and perform text generation.
Flax Implementations:
- model_flax.py: The Flax implementation.
PyTorch Implementation:
- model_torch.py: A reference implementation in PyTorch.
Conversion Script:
- torch_to_flax.py: A utility to convert a PyTorch checkpoint (state dictionary) into Flax parameters.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.ipynb		inference.ipynb
model_flax.py		model_flax.py
model_torch.py		model_torch.py
requirements.txt		requirements.txt
torch_to_flax.py		torch_to_flax.py