Guidance on anonymization/pseudonymization

I'd like to propose that ECS adds guidance for anonymization and pseudonymization. Some thoughts:

**Definitions**
- anonymization: Irreversible data obfuscation.
- pseudonymization: Reversible data obfuscation.

**PII model**
The [NIST 800-122 publication on PII identifies](https://nvlpubs.nist.gov/nistpubs/legacy/sp/nistspecialpublication800-122.pdf) levels of personal identifiable information:
- High (4): publication has severe/catastrophic effects
- Medium (3): publication has serious adverse effects
- Low (2): publication has limited adverse effects
- Public (1): not part of PII, but describes non-personal data

Typically if one is allowed to see PII level `X`, one can also see PII levels `< X` (the Air Force One uses the same method: walk freely towards the rear, but never walk forward of your own seat). We could also imagine putting `pii_<level>` as a pre- or postfix in field names to easily manage Field Level Security (because it supports access based on wildcards (`*`)).

**Varying levels of obfuscation**
We should also recognize that various versions of the same field can (and should) exist in harmony. Perhaps the Dutch postal code system is a good example:
- `postalcode: 1234AB`

The system is set up so that each character to the right is adding more precision to the location.

Perhaps in Elasticsearch this becomes:
- `customer.postalcode.raw: 1234AB`
- `customer.postalcode.city: 12`
- `customer.postalcode.obfuscated: E32DB25A9BAAA6AF655FE65A861C9BD35AF1868229E0E9D738236B4500626AFB`

Or, implementing PII:
- `customer.postalcode.pii4: 1234AB` <-- perhaps enough to identify the customer
- `customer.postalcode.pii2: 12` <-- not enough to identify the customer
- `customer.postalcode.pii1: E32DB25A9BAAA6AF655FE65A861C9BD35AF1868229E0E9D738236B4500626AFB` <-- not enough to identify the customer, but based on PII 4 data hence we can bucket customers of the same street without knowing which street it is.

The above would allow various users to access the postal code at an appropriate level for their usage (in case Business Analytics, for example, uses non-PII 3 or 4 data only due to laws on personal data like GDPR).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Guidance on anonymization/pseudonymization #68

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Guidance on anonymization/pseudonymization #68

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions