Inspired by the work of Liu et.al., 2024 (Kangaroo), we propose a way to distill and evalulate smaller versions of a larger language models. For instance a 20 Layer Qwen2.5 Model distilled from the 36 Layer full size version of the same.
- optionally exclude the MLP from the Adapter. Right now it is (norm -> self_attn -> norm -> MLP). Make it (norm -> self_attn -> norm)
- add support for sliding window attention
This repository provides a template for LLM-based projects with:
- Docker-based development
- Jekyll-based documentation
- Hugging Face API integrations
- Submodules for external repositories (e.g., Meta's Coconut, LLM2Vec)
- Poetry-based dependency management
-
Clone the repository:
git clone --recurse-submodules https://github.com/cattomantis/hidden.git
-
Build Docker Container:
docker compose up experiments -d