Skip to content

awsm-research/SEALGuard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SEALGuard Replication Package

This repository provides the replication package for our SEALGuard experiments on multilingual LLM safety alignment.

SEALGuard Logo

🛡️ Multilingual Guardrail — SEALGuard

SEALGuard Overview

Two models are publicly available on the Hugging Face Hub:


SEALSBench Logo

📊 Multilingual LLM Safety Alignment Benchmark — SEALSBench

SEALSBench Data Curation

Our benchmark dataset is also publicly available:


📚 Table of Contents

  1. Environment Setup
  2. Reproduce SEALGuard
  3. Reproduce Baseline LlamaGuard
  4. Results CSV Files Available
  5. Citation

1. Environment Setup

We recommend using Python 3.12 for best compatibility and performance.

Step 1: Install Python Requirements

To install all necessary dependencies, run:

pip install -r requirements.txt

Step 2: Install PyTorch with CUDA

If you’re using an NVIDIA GPU, we highly recommend installing PyTorch with CUDA support to accelerate training and inference. Follow the official installation guide from PyTorch: 👉 https://pytorch.org/get-started/locally


2. Reproduce SEALGuard

To run SEALGuard evaluation:

cd ./sealguard/
python main_seallm.py

⚙️ Required Variables Before running, make sure to set the following variables inside main_seallm.py:

  • hf_token → Your Hugging Face API key
  • model_id → Choose one of the following models:
    • "MickyMike/SEALGuard-7B"
    • "MickyMike/SEALGuard-1.5B"

🔁 To Retrain SEALGuard using LoRA tuning:

cd ./sealguard/
sh train.sh

The LoRA-tuned model will be saved to the local directory: ./SEALGuard-7B/

You can adjust the model name and training config inside train.sh as needed.


3. Reproduce Baseline LlamaGuard

To evaluate the baseline LlamaGuard model, run the following commands:

cd ./llamaguard
python main.py

⚙️ Required Variables Before running, make sure to set the following variables inside main_seallm.py:

  • hf_token → Your Hugging Face API key
  • model_id → Choose one of the following models:
    • "meta-llama/Llama-Guard-3-8B"
    • "meta-llama/Llama-Guard-3-1B"

4. Results CSV Files Available

All result files are available in the ./results folder.
Each CSV file contains model predictions along with the original input prompts for further analysis.


5. Citation

If you use SEALGuard or SEALSBench in your work, please consider citing our paper:

under review

About

SEALGuard: Safeguarding the Multilingual Conversations in Southeast Asian Languages for LLM Software Systems

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •