Add in concept of "concentration" for disclosure control

DataSHIELD doesn't currently appear to have the concept of concentration as one of the [disclosure controls](https://data2knowledge.atlassian.net/wiki/spaces/DSDEV/pages/714768398/Disclosure+control).

The idea is to limit the proportion of a statistic that can be made by a single value from the set of values being sampled. In simple terms, if we have the numbers 0.1, 0.2, 0.3, 0.5, 4e6, 0.6, 0.5, then we should block the mean of this because one value dominates and it is disclosive. At the moment, this passes the standard `nfilter.tab` test.

The limit could be set to no value should be more than 0.9 of the statistic.

The first functions where this will be implemented are `ds.mean()` and similar. One of the attack modes is to create a vector of all 0s except a single 1, multiply this with the column of interest and take the mean. Knowing the length allows recreation of a value. Moving the 1 allows all values to be recreated. This change will stop this attack.

This control will not help with other differencing attacks (as per Stefan's work)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add in concept of "concentration" for disclosure control #250

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add in concept of "concentration" for disclosure control #250

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions