An end-to-end GPU framework for authenticating machine learning artifacts.
This work is described in an accepted paper to be published soon. Stay tuned for more.
Before setting up, please refer to link to ensure compatibility between driver, compiler and runtime APIs.
Docker
Docker Compose
Nvidia Container Toolkit
mkdir -p ./signatures
python3 -m venv .venv
source .venv/bin/activate
export TORCH_HOME=./torch
pip install -r requirements.txt
openssl ecparam -name prime256v1 -genkey -noout -out private.pem
openssl ec -in private.pem -pubout -out public.pem
export CUFILE_ENV_PATH_JSON=cufile.json
nvcc -Xcompiler '-fPIC' -o ./RapidEC/gsv.so -shared ./RapidEC/gsv.cu
mkdir -p ./signatures
Some ML models from huggingface require a user to be logged in to avoid timeout errors.
Create an account and follow the guide to acquire your access token.
Then, place the access token in the file named 'hf_access_token'.
docker compose up --build sentry_dataset
docker compose up --build sentry_trainer
docker compose up --build sentry_inferencer
python agent_dataset.py uoft-cs/cifar10 16 1 dataset/cifar10
python agent_trainer.py --sig_out ./signatures --model_path ./torch private-key --private_key private.pem
python agent_inferencer.py --sig_out ./signatures --model_path ./torch private-key --private_key private.pem
Before Sentry:
import torchvision.models as models
from torch.utils.data import DataLoader
model = models.vgg19(weights=models.VGG19_Weights.DEFAULT)
dataloader = DataLoader(testing_data, batch_size=128, shuffle=True)
for data in dataloader:
x, y = data[0]['data'], data[0]['label']
pred = model(x)After Sentry:
import torchvision.models as models
from common import get_image_dataloader
import sentry
model = models.vgg19(weights=models.VGG19_Weights.DEFAULT)
# get Sentry's custom DALI-based dataloader which supports GPUDirect and dataset hashing
dataloader, hasher = get_image_dataloader(
path='./dataset/cifar10', batch=128, device='gpu', gds=True,
)
# verify model
sentry.verify_model(model)
for data in dataloader:
x, y = data[0]['data'], data[0]['label']
pred = model(x)
# verify dataset
sentry.verify_dataset(hasher.compute())While the default setting uses a Merkle Tree with SHA256 to hash models, the signer may configure the hashing protocol with a combination of the settings below.
| Settings | Supported Options | Restrictions |
|---|---|---|
| Topology | Merkle, Lattice | Lattice must use BLAKE2XB |
| HashAlgo | SHA256, Blake2B, SHA3, BLAKE2XB | |
| Workflow | Coalesced, Layered, Inplace |
This project demonstrates how to protect the integrity of a model by signing it with Sigstore, a tool for making code signatures transparent without requiring management of cryptographic key material.
When users download a given version of a signed model they can check that the signature comes from a known or trusted identity and thus that the model hasn't been tampered with after training.
We are able to sign large models with very good performance, as the following table shows:
| Model | Size | Hash Time |
|---|---|---|
| microsoft/resnet-152 | 270M | 5.5 ms |
| google-bert/bert-base-uncased | 538M | 11.4 ms |
| pytorch/vision/vgg19 | 1.1G | 29.1 ms |
| openai-community/gpt2 | 1.1G | 32.3 ms |
| openai-community/gpt2-xl | 8.6G | 296.6 ms |
cd sentry/model_signing/cuda
cd sentry/model_signing/hashing/topology.py
