Skip to content

huggingface/smollm

Repository files navigation

SmolLM2

SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. They are capable of solving a wide range of tasks while being lightweight enough to run on-device. You can find our most capable model 🤏 SmolLM2-1.7B-Instruct here.

News 📰

  • Introducing FineMath, the best public math pre-training dataset 🚀
  • We added the code to do continual pre-training of Llama 3.2 3B on FineMath & FineWeb-Edu with nanotron at pre-training/continual-pretraining

Table of Contents

  1. Usage
  2. Pre-training
  3. SmolVLM
  4. Fine-tuning
  5. Evaluation
  6. Synthetic data pipelines

Usage

Our most powerful model is SmolLM2-1.7B-Instruct, which you can use as an assistant with transformers, trl, or using quantized versions with tools like llama.cpp, MLX, and transformers.js. For lighter applications, you can also use the smaller models SmolLM2-360M andSmolLM2-135M, which are suitable for on-device usage and can be integrated similarly. All available in this collection.

Transformers

pip install transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM2-1.7B-Instruct"

device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

messages = [{"role": "user", "content": "Write a 100-word article on 'Benefits of Open-Source in AI research"}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))

Chat in TRL

You can also use the TRL CLI to chat with the model from the terminal:

pip install trl
trl chat --model_name_or_path HuggingFaceTB/SmolLM2-1.7B-Instruct --device cpu

You can find more details on how to leverage the model for use cases such as text summarization, text rewriting and function calling in the model card: https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct

Local inference

You can use the models locally with frameworks like llama.cpp, MLX, MLC and transformers.js. You can find the instructions to run SmolLM2 with these frameworks at local-inference.

Smol-tools

A collection of lightweight AI-powered tools built with LLaMA.cpp and small language models. These tools are designed to run locally on your machine without requiring expensive GPU resources. Further instructions on how to use the tools can be found in the smol-tools README.

Pre-training

You can find scripts for launching pre-training with nanotron under pre-training, we share the exact configs for training SmolLM1 and will upload SmolLM2's configs soon.

SmolVLM

We released SmolVLM a compact open multimodal model that accepts arbitrary sequences of image and text inputs to produce text outputs. It uses SmolLM2-1.7B-Instruct as a language backbone and is designed for efficiency. SmolVLM can answer questions about images, describe visual content, create stories grounded on multiple images, or function as a pure language model without visual inputs. Its lightweight architecture makes it suitable for on-device applications while maintaining strong performance on multimodal tasks. More details in this blog post: https://huggingface.co/blog/smolvlm

Check inference/smolvlm for more details and finetuning/Smol_VLM_FT.ipynb for some finetuning code.

Fine-tuning

You can find an example script to finetune SmolLM2 using TRL and PEFT in the finetuning folder. We also link to our post-training scripts for SmolLM2 using the alignment handbook.

Evaluation

image/png

You can find more detailed evaluation of each model size in the model cards in this collection. We use lighteval for all our evaluations, for more details refer to the evaluation README.

Synthetic data pipelines

We released SmolTalk the SFT dataset used for building SmolLM2 instruct models. It was created with distilabel and you can check and execute the synthetic data pipelines in distilabel_pipelines README

Comparison of models finetuned on SmolTalk and Orca AgentInstruct 1M. For more details, refer to the dataset card.