Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conservative search for target epsilon in get_noise_multiplier #348

Conversation

alexandresablayrolles
Copy link
Contributor

@alexandresablayrolles alexandresablayrolles commented Feb 4, 2022

Problem

@tholop noticed in #346 that get_noise_multiplier can overshoot the target epsilon when searching for highest possible noise multiplier. This is because of the binary search implementation.

Changes

  • Noise multiplier now verifies that the actual epsilon is lower than the target epsilon.
    • Added test for this.
  • Bounds of the range of search for sigma, and precision can now be given as arguments to get_noise_multiplier.

Differential Revision: D34003401

@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Feb 4, 2022
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D34003401

@romovpa romovpa changed the title Improve get_noise_multiplier Conservative search for target epsilon in get_noise_multiplier Feb 4, 2022
Comment on lines 51 to 67
while eps > target_epsilon:
sigma_max = 2 * sigma_max
accountant.steps = [(sigma_max, sample_rate, int(epochs / sample_rate))]
sigma_high = 2 * sigma_high
accountant.steps = [(sigma_high, sample_rate, steps)]
eps = accountant.get_epsilon(delta=target_delta, **kwargs)
if sigma_max > MAX_SIGMA:
if sigma_high > sigma_max:
raise ValueError("The privacy budget is too low.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we try to automatically initialize sigma_high for the binary search. This behaviour is not consistent with the docstring: I can set a specific bound but it can change it anyway. Also, having both sigma_high and sigma_max is counter-intuitive and it takes knowing the whole algorithm to understand what is the difference.

Proposal

Leave only sigma_low, sigma_high in parameters. I would also rename to min/max to emphasize that the result will be bounded:

def get_noise_multiplier(
    ...
    sigma_min: float = 0.01,
    sigma_max: Optional[float] = None,
)

If sigma_max is None, run this part to get the upper bound starting from sigma_max+1.

Leave MAX_SIGMA a global constant, maybe make it higher.

Comment on lines 12 to 15
sigma_precision=0.01,
sigma_low=0.01,
sigma_high=10,
sigma_max=2000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type definitions please

accountant: str = "rdp",
sigma_precision=0.01,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename it to sigma_tolerance, which is the common word used in optimization (see scipy.bisect).

Isn't this precision too low? I mean we should make this value lower, closer to typical float precision. Btw, this higher/lower confusion is another reason to rename the argument to tolerance.

Comment on lines 9 to 10
epochs: int = None,
steps: int = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the argument can be None, it should have type Optional[int]


return sigma
return sigma_high
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible corner case when sigma_high < sigma_low is not considered here.

Comment on lines 58 to 78
while sigma_high - sigma_low > sigma_precision:
sigma = (sigma_low + sigma_high) / 2
accountant.steps = [(sigma, sample_rate, steps)]
eps = accountant.get_epsilon(delta=target_delta, **kwargs)

if eps < target_epsilon:
sigma_max = sigma
sigma_high = sigma
else:
sigma_min = sigma
sigma_low = sigma
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking if we can use scipy.optimzie.bisect() instead of implementing our own.
It is easy to implement binary search incorrectly so we need to make tests for all corner cases. If we use scipy, we don't need to add our own tests, plus it is much more stable than the rest of our dependencies.

accountant: str = "rdp",
sigma_precision=0.01,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add check that sigma_tolerance is strictly positive. Alternative: pass is to scipy.bisect and let it do the check for you.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D34003401

alexandresablayrolles pushed a commit to alexandresablayrolles/opacus that referenced this pull request Feb 10, 2022
Summary:
Pull Request resolved: pytorch#348

Noise multiplier now verifies that the actual epsilon is *lower* than the target epsilon.
Bounds of the range of search for sigma, and precision can now be given as arguments to get_noise_multiplier.

Differential Revision: D34003401

fbshipit-source-id: bed5aab383a867d1608e692c54c0832b8acef373
alexandresablayrolles pushed a commit to alexandresablayrolles/opacus that referenced this pull request Feb 11, 2022
Summary:
Pull Request resolved: pytorch#348

Noise multiplier now verifies that the actual epsilon is *lower* than the target epsilon.
Bounds of the range of search for sigma, and precision can now be given as arguments to get_noise_multiplier.

Differential Revision: D34003401

fbshipit-source-id: ba00d6bc0a60bdba71564a81dd5e895590c7a0ea
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D34003401

eps = accountant.get_epsilon(delta=target_delta, **kwargs)
if eps < target_epsilon:
warnings.warn(
f"Noise multiplier of {sigma_low} is still too high for a privacy budget of {target_epsilon}, please use a lower sigma_low."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would replace "please use a lower sigma_low" with something like:
"you can benefit quality by allowing smaller sigma_low without exceeding target epsilon"

But the current version is also fine.

accountant: str = "rdp",
epsilon_tolerance: float = 0.01,
sigma_low: float = 0.01,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to ask, why the default sigma_low is so high. We will surprise the user with the warning and suboptimal result if the best sigma is 1e-3 for example.
But then I found that I don't understand what is the point of having this parameter at all? Why not starting with sigma_low=0 always?

This won't hurt this expression target_epsilon - eps_high > epsilon_tolerance. We will never reach this value closer than tolerance/2.

accountant: str = "rdp",
epsilon_tolerance: float = 0.01,
sigma_low: float = 0.01,
sigma_high: float = 200,
Copy link
Contributor

@romovpa romovpa Feb 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we so sure that 200 is high enough?

Why not eliminate this constant by probing sigma_high = tol*2^k using this removed algorithm:

    sigma_high = tol
    while eps > target_epsilon:
        sigma_high = 2 * sigma_high
        accountant.steps = [(sigma_high, sample_rate, int(epochs / sample_rate))]
        eps = accountant.get_epsilon(delta=target_delta, **kwargs)
        if sigma_high > MAX_SIGMA:
            raise ValueError("The privacy budget is too low.")

Summary:
Pull Request resolved: pytorch#348

Noise multiplier now verifies that the actual epsilon is *lower* than the target epsilon.
Bounds of the range of search for sigma, and precision can now be given as arguments to get_noise_multiplier.

Differential Revision: D34003401

fbshipit-source-id: 93cedb970a44e1cf406834ffa61f34307de43ae1
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D34003401

Copy link
Contributor

@romovpa romovpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet!
One nit to add a comment.

# Noise higher than MAX_SIGMA considered unreasonable
MAX_SIGMA = 2000

MAX_SIGMA = 1e6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a comment before which I recommend to leave because the constant looks a bit surprising.

# Noise level higher than MAX_SIGMA considered unreasonable 
# leading to "not sufficient privacy budget" exception
MAX_SIGMA = 1e6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The Sigma binary search is not conservative and can overshoot target_epsilon
3 participants