Skip to content

An end-to-end GPU framework for authenticating machine learning artifacts.

License

Notifications You must be signed in to change notification settings

Andrew-Gan/Sentry

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentry

An end-to-end GPU framework for authenticating machine learning artifacts.

Overview

This work is described in an accepted paper to be published soon. Stay tuned for more.
Before setting up, please refer to link to ensure compatibility between driver, compiler and runtime APIs.

Setup

Docker

Docker
Docker Compose
Nvidia Container Toolkit

mkdir -p ./signatures

Native run

python3 -m venv .venv
source .venv/bin/activate

export TORCH_HOME=./torch

pip install -r requirements.txt

openssl ecparam -name prime256v1 -genkey -noout -out private.pem
openssl ec -in private.pem -pubout -out public.pem

export CUFILE_ENV_PATH_JSON=cufile.json

nvcc -Xcompiler '-fPIC' -o ./RapidEC/gsv.so -shared ./RapidEC/gsv.cu

mkdir -p ./signatures

Huggingface login

Some ML models from huggingface require a user to be logged in to avoid timeout errors.
Create an account and follow the guide to acquire your access token.
Then, place the access token in the file named 'hf_access_token'.

Run

Docker

docker compose up --build sentry_dataset
docker compose up --build sentry_trainer
docker compose up --build sentry_inferencer

Native run

python agent_dataset.py uoft-cs/cifar10 16 1 dataset/cifar10
python agent_trainer.py --sig_out ./signatures --model_path ./torch private-key --private_key private.pem
python agent_inferencer.py --sig_out ./signatures --model_path ./torch private-key --private_key private.pem

Example

Before Sentry:

import torchvision.models as models
from torch.utils.data import DataLoader

model = models.vgg19(weights=models.VGG19_Weights.DEFAULT)
dataloader = DataLoader(testing_data, batch_size=128, shuffle=True)

for data in dataloader:
    x, y = data[0]['data'], data[0]['label']
    pred = model(x)

After Sentry:

import torchvision.models as models
from common import get_image_dataloader
import sentry

model = models.vgg19(weights=models.VGG19_Weights.DEFAULT)
# get Sentry's custom DALI-based dataloader which supports GPUDirect and dataset hashing
dataloader, hasher = get_image_dataloader(
    path='./dataset/cifar10', batch=128, device='gpu', gds=True,
)

# verify model
sentry.verify_model(model)

for data in dataloader:
    x, y = data[0]['data'], data[0]['label']
    pred = model(x)

# verify dataset
sentry.verify_dataset(hasher.compute())

Configuration

While the default setting uses a Merkle Tree with SHA256 to hash models, the signer may configure the hashing protocol with a combination of the settings below.

Settings Supported Options Restrictions
Topology Merkle, Lattice Lattice must use BLAKE2XB
HashAlgo SHA256, Blake2B, SHA3, BLAKE2XB
Workflow Coalesced, Layered, Inplace

Evaluation

This project demonstrates how to protect the integrity of a model by signing it with Sigstore, a tool for making code signatures transparent without requiring management of cryptographic key material.

When users download a given version of a signed model they can check that the signature comes from a known or trusted identity and thus that the model hasn't been tampered with after training.

We are able to sign large models with very good performance, as the following table shows:

Model Size Hash Time
microsoft/resnet-152 270M 5.5 ms
google-bert/bert-base-uncased 538M 11.4 ms
pytorch/vision/vgg19 1.1G 29.1 ms
openai-community/gpt2 1.1G 32.3 ms
openai-community/gpt2-xl 8.6G 296.6 ms

Contributions

Adding a new CUDA kernel for hashing

cd sentry/model_signing/cuda cd sentry/model_signing/hashing/topology.py

About

An end-to-end GPU framework for authenticating machine learning artifacts.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published