Description
DataSHIELD doesn't currently appear to have the concept of concentration as one of the disclosure controls.
The idea is to limit the proportion of a statistic that can be made by a single value from the set of values being sampled. In simple terms, if we have the numbers 0.1, 0.2, 0.3, 0.5, 4e6, 0.6, 0.5, then we should block the mean of this because one value dominates and it is disclosive. At the moment, this passes the standard nfilter.tab
test.
The limit could be set to no value should be more than 0.9 of the statistic.
The first functions where this will be implemented are ds.mean()
and similar. One of the attack modes is to create a vector of all 0s except a single 1, multiply this with the column of interest and take the mean. Knowing the length allows recreation of a value. Moving the 1 allows all values to be recreated. This change will stop this attack.
This control will not help with other differencing attacks (as per Stefan's work)