This repository provides the replication package for our SEALGuard experiments on multilingual LLM safety alignment.
Two models are publicly available on the Hugging Face Hub:
Our benchmark dataset is also publicly available:
- Environment Setup
- Reproduce SEALGuard
- Reproduce Baseline LlamaGuard
- Results CSV Files Available
- Citation
We recommend using Python 3.12 for best compatibility and performance.
To install all necessary dependencies, run:
pip install -r requirements.txt
If you’re using an NVIDIA GPU, we highly recommend installing PyTorch with CUDA support to accelerate training and inference. Follow the official installation guide from PyTorch: 👉 https://pytorch.org/get-started/locally
To run SEALGuard evaluation:
cd ./sealguard/
python main_seallm.py
⚙️ Required Variables Before running, make sure to set the following variables inside main_seallm.py:
- hf_token → Your Hugging Face API key
- model_id → Choose one of the following models:
- "MickyMike/SEALGuard-7B"
- "MickyMike/SEALGuard-1.5B"
🔁 To Retrain SEALGuard using LoRA tuning:
cd ./sealguard/
sh train.sh
The LoRA-tuned model will be saved to the local directory: ./SEALGuard-7B/
You can adjust the model name and training config inside train.sh as needed.
To evaluate the baseline LlamaGuard model, run the following commands:
cd ./llamaguard
python main.py
⚙️ Required Variables Before running, make sure to set the following variables inside main_seallm.py:
- hf_token → Your Hugging Face API key
- model_id → Choose one of the following models:
- "meta-llama/Llama-Guard-3-8B"
- "meta-llama/Llama-Guard-3-1B"
All result files are available in the ./results
folder.
Each CSV file contains model predictions along with the original input prompts for further analysis.
If you use SEALGuard or SEALSBench in your work, please consider citing our paper:
under review