License

Mixture of Depths Scaling

Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models". From the paper: "These models match baseline performance for equivalent FLOPS and wall-clock times to train, but require a fraction of the FLOPs per forward pass, and can be upwards of 50% faster to step during post-training sampling."

install

pip3 install mixture-of-depths

usage

import torch
from mixture_of_depths.main import MoD

x = torch.randn(1, 1000, 512)
# mask = torch.ones(1)

# Model
model = MoD(
    seq_len=1000,
    dim=512,
    capacity_factor=0.12,
    vocab_size=10000,
    transformer_depth=8,
)

# Model
out = model(x)
print(out)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
mixture_of_depths		mixture_of_depths
scripts		scripts
.DS_Store		.DS_Store
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
README.md		README.md
agorabanner.png		agorabanner.png
example.py		example.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mixture of Depths Scaling

install

usage

License

About

Releases

Packages

Languages

License

DL-ViT/Mixture-of-Depths

Folders and files

Latest commit

History

Repository files navigation

Mixture of Depths Scaling

install

usage

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages