Open
Description
Synthetics relies on encrypted saved objects (ESOs) which are prone to encounter failures to decrypt in the following cases:
- AAD has been omitted through a partial update - When partial updates of ESOs occurs, if a value included as part of the AAD is excluded, the ESO will fail to be decrypted
- Attributes Excluded from AAD has been updated without a migration - If you attempt to change the
attributesExcludedFromAAD
property on the ESO spec without a migration, the ESO will fail to be decrypted. - A user changes their encryption key - If the encryption key is changed without following the documentation for cycling encryption keys, ESOs will fail to be decrypted
When this happens, a user's monitor stops running and the user receives an error if they visit the monitor page. They can delete the monitor from the management page but there is no way to recover the previous settings.
When the user goes to upgrade their stack, they may encounter saved object migrations which will fail due to an inability to decrypt.
We should come up with a solution for mitigating this issue when it happens, even if we cannot restore the user's original data. We should ideally avoid blocking saved object migrations. We can address this issue one of two ways:
- Create known issues for support - We'll handle possible Synthetics SO encryption issues as a support-focused issue
- Consider using
shouldMigrateIfDecryptionFails
- avoid blocking migrations on decryption failures, and instead allow the migration to progress. When this happens, we should set the values stored in the encrypted values back to their defaults. In the case of synthetics monitors, thesecrets
key consists of an object as a JSON string. We would need to set those individual values back to their defaults and then stringify the secrets.
Implementing solution 2 has some challenges
- We'd ideally only like to continue on failures SO migrations in the event that the user has encountered case 3 in the above failure cases. The user should not be forced to lose their data if they happened to accidentally change the encryption key without following the proper steps. A better resolution, in that case, would be to restore the original encryption key and try again. This issue is present only on-prem. Encryption keys are managed for users in cloud. It is unclear how we'd determine, programmatically, whether the failure arises from case 3 or case 1 or 2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment