-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Description
In dpmm/src/dpmm/models/base/mechanisms/mechanism.py, the mechanism's domain is automatically inferred by calculating the maximum value of the raw input dataframe.
In a Differential Privacy (DP) context, any operation performed on the raw data must be accounted for in the privacy budget (
Location
File: dpmm/src/dpmm/models/base/mechanisms/mechanism.py
Line: 113
_domain = (df.astype(int).max(axis=0) + 1).to_dict()By calculating df.max(axis=0) directly on the private dataframe, the specific maximum value of a sensitive attribute is revealed without any noise or privacy cost.
Suggested Patch
The domain should be treated as a hyperparameter or a private statistic. Consider one of the following approaches:
- User-Provided Bounds: Require the user to pass a domain or bounds argument derived from public knowledge or data schemas.
- Private Max Computation: Use a DP-compliant mechanism to find a noisy upper bound, and subtract the cost from the total privacy budget.