You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DataSHIELD doesn't currently appear to have the concept of concentration as one of the disclosure controls.
The idea is to limit the proportion of a statistic that can be made by a single value from the set of values being sampled. In simple terms, if we have the numbers 0.1, 0.2, 0.3, 0.5, 4e6, 0.6, 0.5, then we should block the mean of this because one value dominates and it is disclosive. At the moment, this passes the standard nfilter.tab test.
The limit could be set to no value should be more than 0.9 of the statistic.
The first functions where this will be implemented are ds.mean() and similar. One of the attack modes is to create a vector of all 0s except a single 1, multiply this with the column of interest and take the mean. Knowing the length allows recreation of a value. Moving the 1 allows all values to be recreated. This change will stop this attack.
This control will not help with other differencing attacks (as per Stefan's work)
The text was updated successfully, but these errors were encountered:
This solution will also not help with the trick of repeating a value several times. That is, perform the steps detailed above, but copy the vector 5 times. Rbind these vectors together. The concentration trap will no longer work because there will be 5 values contributing to the mean, and dividing by 5 will yield the answer as before.
A proposed solution for this will be opened in a separate issue
DataSHIELD doesn't currently appear to have the concept of concentration as one of the disclosure controls.
The idea is to limit the proportion of a statistic that can be made by a single value from the set of values being sampled. In simple terms, if we have the numbers 0.1, 0.2, 0.3, 0.5, 4e6, 0.6, 0.5, then we should block the mean of this because one value dominates and it is disclosive. At the moment, this passes the standard
nfilter.tab
test.The limit could be set to no value should be more than 0.9 of the statistic.
The first functions where this will be implemented are
ds.mean()
and similar. One of the attack modes is to create a vector of all 0s except a single 1, multiply this with the column of interest and take the mean. Knowing the length allows recreation of a value. Moving the 1 allows all values to be recreated. This change will stop this attack.This control will not help with other differencing attacks (as per Stefan's work)
The text was updated successfully, but these errors were encountered: