Skip to content

Fix for floating point representation attack #260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Nov 18, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion opacus/optimizers/ddp_perlayeroptimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ def __init__(
expected_batch_size: Optional[int],
loss_reduction: str = "mean",
generator=None,
secure_mode=False,
):
self.rank = torch.distributed.get_rank()
self.max_grad_norms = max_grad_norms
Expand All @@ -43,6 +44,7 @@ def __init__(
expected_batch_size=expected_batch_size,
loss_reduction=loss_reduction,
generator=generator,
secure_mode=secure_mode,
)
self.register_hooks()

Expand All @@ -51,7 +53,10 @@ def _add_noise_parameter(self, p):
The reason why we need self is because of generator for secure_mode
"""
noise = _generate_noise(
self.noise_multiplier * self.max_grad_norm, p.summed_grad
std=self.noise_multiplier * self.max_grad_norm,
reference=p.summed_grad,
generator=None,
secure_mode=self.secure_mode,
)
p.grad = p.summed_grad + noise

Expand Down
2 changes: 2 additions & 0 deletions opacus/optimizers/ddpoptimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ def __init__(
expected_batch_size: Optional[int],
loss_reduction: str = "mean",
generator=None,
secure_mode=False,
):
super().__init__(
optimizer,
Expand All @@ -26,6 +27,7 @@ def __init__(
expected_batch_size=expected_batch_size,
loss_reduction=loss_reduction,
generator=generator,
secure_mode=secure_mode,
)
self.rank = torch.distributed.get_rank()
self.world_size = torch.distributed.get_world_size()
Expand Down
62 changes: 57 additions & 5 deletions opacus/optimizers/optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,19 +9,68 @@


def _generate_noise(
std: float, reference: torch.Tensor, generator=None
std: float,
reference: torch.Tensor,
generator=None,
secure_mode: bool = False,
) -> torch.Tensor:
if std > 0:
# TODO: handle device transfers: generator and reference tensor
# could be on different devices
"""
Generates noise according to a Gaussian distribution with mean 0

Args:
std: Standard deviation of the noise
reference: The reference Tensor to get the appripriate shape and device
for generating the noise
generator: The PyTorch noise generator
secure_mode: boolean showing if "secure" noise need to be generate
(see the notes)

Notes:
If `secure_mode` is enabled, the generated noise is also secure
against the floating point representation attacks, such as the ones
in https://arxiv.org/abs/2107.10138. This is achieved through calling
the Gaussian noise function 2*n times, when n=2 (see section 5.1 in
https://arxiv.org/abs/2107.10138).

Reason for choosing n=2: n can be any number > 1. The bigger, the more
computation needs to be done (`2n` Gaussian samples will be generated).
The reason we chose `n=2` is that, `n=1` could be easy to break and `n>2`
is not really necessary. The complexity of the attack is `2^p(2n-1)`.
In PyTorch, `p=53` and so complexity is `2^53(2n-1)`. With `n=1`, we get
`2^53` (easy to break) but with `n=2`, we get `2^159`, which is hard
enough for an attacker to break.
"""
zeros = torch.zeros(reference.shape, device=reference.device)
if std == 0:
return zeros
# TODO: handle device transfers: generator and reference tensor
# could be on different devices
if secure_mode:
torch.normal(
mean=0,
std=std,
size=(1, 1),
device=reference.device,
generator=generator,
) # generate, but throw away first generated Gaussian sample
sum = zeros
for i in range(4):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trust you and Ilya that it solves the problem, but could you pls do ELI5 why this works?
I remember "Option 3" from your doc, but I'm not sure I understand how this relates to it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, the sum from 1 to 4 gives you a Gaussian with variance 4 std^2, thus sum/2 is a gaussian with variance std^2. @ashkan-software : shouldn't you loop over only 2 samples as per the docstring?

Copy link
Contributor Author

@ashkan-software ashkan-software Nov 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question @ffuuugor.

This approach is actually not any of those 3 options we had. I found this approach in a recent paper and me and Ilya think this is an intelligent way of fixing the problem. The idea is to invert the Gaussian mechanism and guess what values used as input to the mechanism. This is possible if Gaussian method is used once. But if we use the Gaussian more than once (in this fix, we call it 4 times), it becomes exponentially harder to guess those values. This is in very simple words, but the fix is a bit more involved and is explained in the paper I listed on the PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexandresablayrolles

The reason for having number 4 and 2 in the code, is that when n=2, we get those values (Section 5.1 in the paper):

sum(gauss(0, 1) for i in range(2 * n)) / sqrt(2 * n)

sum += torch.normal(
mean=0,
std=std,
size=reference.shape,
device=reference.device,
generator=generator,
)
return sum / 2
else:
return torch.normal(
mean=0,
std=std,
size=reference.shape,
device=reference.device,
generator=generator,
)
return torch.zeros(reference.shape, device=reference.device)


def _get_flat_grad_sample(p: torch.Tensor):
Expand All @@ -47,6 +96,7 @@ def __init__(
expected_batch_size: Optional[int],
loss_reduction: str = "mean",
generator=None,
secure_mode=False,
):
if loss_reduction not in ("mean", "sum"):
raise ValueError(f"Unexpected value for loss_reduction: {loss_reduction}")
Expand All @@ -63,6 +113,7 @@ def __init__(
self.expected_batch_size = expected_batch_size
self.step_hook = None
self.generator = generator
self.secure_mode = secure_mode

self.param_groups = optimizer.param_groups
self.state = optimizer.state
Expand Down Expand Up @@ -137,6 +188,7 @@ def add_noise(self):
std=self.noise_multiplier * self.max_grad_norm,
reference=p.summed_grad,
generator=self.generator,
secure_mode=self.secure_mode,
)
p.grad = p.summed_grad + noise

Expand Down
2 changes: 2 additions & 0 deletions opacus/optimizers/perlayeroptimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ def __init__(
expected_batch_size: Optional[int],
loss_reduction: str = "mean",
generator=None,
secure_mode=False,
):
assert len(max_grad_norm) == len(params(optimizer))
self.max_grad_norms = max_grad_norm
Expand All @@ -30,6 +31,7 @@ def __init__(
expected_batch_size=expected_batch_size,
loss_reduction=loss_reduction,
generator=generator,
secure_mode=secure_mode,
)

def attach(self, optimizer):
Expand Down
3 changes: 2 additions & 1 deletion opacus/privacy_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ def _prepare_optimizer(
expected_batch_size=expected_batch_size,
loss_reduction=loss_reduction,
generator=generator,
secure_mode=self.secure_mode,
)

def _prepare_data_loader(
Expand Down Expand Up @@ -266,4 +267,4 @@ def make_private_with_epsilon(
)

def get_epsilon(self, delta):
return self.accountant.get_epsilon(delta)
return self.accountant.get_epsilon(delta)
103 changes: 72 additions & 31 deletions opacus/tests/privacy_engine_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import unittest
from abc import ABC
from typing import Optional, OrderedDict
from unittest.mock import MagicMock, patch

import hypothesis.strategies as st
import torch
Expand All @@ -12,6 +13,7 @@
from hypothesis import given, settings
from opacus import PrivacyEngine
from opacus.layers.dp_multihead_attention import DPMultiheadAttention
from opacus.optimizers.optimizer import _generate_noise
from opacus.utils.module_utils import are_state_dict_equal
from opacus.validators.errors import UnsupportedModuleError
from torch.utils.data import DataLoader, Dataset
Expand Down Expand Up @@ -395,42 +397,81 @@ def test_noise_level(self, noise_multiplier: float, max_steps: int):
"""
Tests that the noise level is correctly set
"""
# Initialize models with parameters to zero
model, optimizer, dl, _ = self._init_private_training(
noise_multiplier=noise_multiplier
)
for p in model.parameters():
p.data.zero_()

# Do max_steps steps of DP-SGD
n_params = sum([p.numel() for p in model.parameters() if p.requires_grad])
steps = 0
for x, y in dl:
optimizer.zero_grad()
logits = model(x)
loss = logits.view(logits.size(0), -1).sum(dim=1)
# Gradient should be 0
loss.backward(torch.zeros(logits.size(0)))
def helper_test_noise_level(
noise_multiplier: float, max_steps: int, secure_mode: bool
):
torch.manual_seed(100)
# Initialize models with parameters to zero
model, optimizer, dl, _ = self._init_private_training(
noise_multiplier=noise_multiplier,
secure_mode=secure_mode,
)
for p in model.parameters():
p.data.zero_()

optimizer.step()
steps += 1
# Do max_steps steps of DP-SGD
n_params = sum([p.numel() for p in model.parameters() if p.requires_grad])
steps = 0
for x, y in dl:
optimizer.zero_grad()
logits = model(x)
loss = logits.view(logits.size(0), -1).sum(dim=1)
# Gradient should be 0
loss.backward(torch.zeros(logits.size(0)))

if max_steps and steps >= max_steps:
break
optimizer.step()
steps += 1

if max_steps and steps >= max_steps:
break

# Noise should be equal to lr*sigma*sqrt(n_params * steps) / batch_size
expected_norm = (
steps
* n_params
* optimizer.noise_multiplier ** 2
* self.LR ** 2
/ (optimizer.expected_batch_size ** 2)
)
real_norm = sum(
[torch.sum(torch.pow(p.data, 2)) for p in model.parameters()]
).item()

# Noise should be equal to lr*sigma*sqrt(n_params * steps) / batch_size
expected_norm = (
steps
* n_params
* optimizer.noise_multiplier ** 2
* self.LR ** 2
/ (optimizer.expected_batch_size ** 2)
)
real_norm = sum(
[torch.sum(torch.pow(p.data, 2)) for p in model.parameters()]
).item()
self.assertAlmostEqual(real_norm, expected_norm, delta=0.15 * expected_norm)

with self.subTest(secure_mode=False):
helper_test_noise_level(
noise_multiplier=noise_multiplier,
max_steps=max_steps,
secure_mode=False,
)
with self.subTest(secure_mode=True):
helper_test_noise_level(
noise_multiplier=noise_multiplier,
max_steps=max_steps,
secure_mode=True,
)

self.assertAlmostEqual(real_norm, expected_norm, delta=0.1 * expected_norm)
@patch("torch.normal", MagicMock(return_value=torch.Tensor([0.6])))
def test_generate_noise_in_secure_mode(self):
"""
Tests that the noise is added correctly in secure_mode,
according to section 5.1 in https://arxiv.org/abs/2107.10138.
Since n=2, noise should be summed 4 times and divided by 2.

In this example, torch.normal returns a constant value of 0.6.
So, the overal noise would be (0.6 + 0.6 + 0.6 + 0.6)/2 = 1.2
"""
noise = _generate_noise(
std=2.0,
reference=torch.Tensor([1, 2, 3]), # arbitrary size = 3
secure_mode=True,
)
self.assertTrue(
torch.allclose(noise, torch.Tensor([1.2, 1.2, 1.2])),
"Model parameters after deterministic run must match",
)


class SampleConvNet(nn.Module):
Expand Down
Binary file added run_results_imdb_classification.pt
Binary file not shown.