Description
Objectives
This document defines a proposal for updating the SPIRE datastore to a simpler pluggable solution, capable of supporting both SQL and KV backend stores. The primary objectives for this change are to:
- Satisfy requests for more storage options
- Simplify installation and operation, especially in container environments
- Streamline plugin development and support
- Scale from a few to hundreds of thousands of agents
- Strengthen SPIRE resiliency solutions
- Stage SPIRE changes across releases in a way that minimizes impact and risk to running systems
Background
The current datastore design dropped support for plugin extensibility, and only supports sql (SQLite, MySQL, and PostgreSQL), with dependencies on gorm. The rapid evolution of the datastore interface made ongoing plugin support impractical. An investigation last year demonstrated the efficacy of a simpler interface for both SQL and KV stores.
The problems identified last year are still largely relevant:
- Implementation complexity
- Interface churn
- SQL-specific challenges
The SPIRE server datastore is responsible for reliable persistence of agent nodes, selectors, registration entries, bundles, and join tokens. Bundles are associated with multiple registrations, nodes have multiple Selectors, registrations have multiple bundles, DNS names, and selectors.
Datastore operations currently include the following (not all objects support all operations):
- Create, Delete, Update, Append, Prune
- Fetch, Count, List
- Get, Set
Queries across several fields are currently supported for attested nodes and registration entries (see ListAttestedNodesRequest and ListRegistrationEntriesRequest protobuf messages). The proposed solution preserves, and in some cases enhances, flexible query capabilities.
High availability support has been added since the last investigation, adding cross server coherency concerns that must be addressed in proposed solutions.
Proposal
This proposal aims to address the above objectives and problems in a phased approach with the goal of minimal disruption throughout the transition. Key elements include:
- Leverage the current datastore caching concepts for read requests wherever possible
- Introduce a simplified backend store plugin interface for item-independent operations
- Add plugins for KV database(s), starting with etcd
- Rewrite existing SQL plugin(s)
Leveraging the current datastore cache concept reduces the need for expensive crawling queries of the backend store by serving them from SPIRE server memory while persisting them externally. Most data are safely cacheable; agent authorization caching will be explored at a later time. Change notifications and periodic full refreshes will be employed to ensure server cache consistency and stale data invalidation.
A simplified pluggable store interface will be created, consisting of Put, Get, Count, and Delete operations for one or more entries. The interface will also provide operations to Begin, Commit, and Abort transactions containing multiple primitive operations for cases where read-modify-write operations are required (e.g. prune bundle) or objects reference other objects (e.g. Registration Entries and Selectors or Bundles). The interface may also define a Watch operation to report new or updated items for distributed real-time cache updates.
The initial KV store reference design will support Etcd and serve as the standard for additional KV store plugins. SQL store(s) will be refactored to the new simplified interface.
Phasing
In the interest of minimizing disruption to production customers, offering clear value and migration paths, and keeping changes to manageable sizes, the following phases are considered:
- Add KV plugin support
- Move current datastore implementation from a plugin to a regular module, retaining the current interface, and SQL backend.
- Create a new “store” plugin with a simplified interface for KV backend(s) only. SQL backends continue to use the existing code in the non-pluggable module. KV backends supported by new code paths in datastore module using the new store interface.
- Create a new etcd KV store plugin with the new store interface.
- Add SQL plugin support
- Create new SQL plugin(s) with the new store interface
- New SQL plugins exist as alternative to existing non-pluggable SQL store
- Add Watch support for real-time cache updates
- Retire legacy SQL support
Store API
The simplest store API passes keys and values rather than entry types and identifiers. This approach is chosen for simplicity of plugin development. Helper functions will be offered as needed to extract entry, ID, and index information from keys.
- Put: add one or more key/value pair(s) to the store
- Get: retrieve one (prefix false) or multiple (prefix true) key/value pair(s) from the store
- Delete: remove one (prefix false) or multiple (prefix true) key/value pair(s) from the store
- Count: return the count of keys with the given prefix
- Begin: initiate a transaction
- Commit: commit a transaction
- Abort: abort a transaction
- Watch: retrieve a stream of updates for one or more key ranges
Objects
KV stores accepts []bytes for keys and values. Building on the prototyping work from last year, two KV object types are required for this design:
Items
- Key - item type ID:item ID
- Value - item gRPC bytes
Question: is it worth the extra few bytes in every record to identify which field is the primary key field in the gRPC bytes? It will be defined as a constant either way.
Indexes
- Key - i:item type ID:item field ID:item field value:item ID
- Value - item type ID:item ID
Question: is it worth the extra bytes in every index record to store "item type ID:item ID" as the value? The data exists in the key and can be reconstructed; is this significantly faster/slower/simpler than iterating over the returned values?
SQL databases prefer integers for primary keys, rather than the []bytes for KV keys. In the interest of keeping the pluggable store interface simple, objects are designed like KV objects. SQL plugins may opt to store them the same way a KV store does or add index columns for ranged operations. Both approaches will be prototyped and evaluated.
Considerations
Data Size
A million agents and/or registration entries at a (hopefully) generous 4KB each would require 4GB of memory - very reasonable for a deployment of that size. This is half the recommended max Etcd data max and well within SQL database limits.
Performance
API authorization requests for a million agents checking in every five seconds would result in 200,000 authorization queries per second. This would strain a database and should be handled primarily from memory if possible. 100,000 agents would require around 20,000 queries per second, well within Etcd and SQL database limits.
Database Availability and Concurrency
Data store availability in the face of backend failures may be mitigated through a number of store specific alternatives. MySQL offers group replication for multi-master database availability. Percona offers resiliency solutions for MySQL, PostgreSQL, and MongoDB. Etcd enforces strict serializability and linearizability by default for parallel multi-node availability.
Concurrency in distributed systems is critical for data consistency and correctness. SQL transactions and locking provide this for SPIRE today. Etcd has documented notes for distributed coordination, which will be used to ensure transactional integrity.
Database Migration
SQL database migrations are unaffected in the first phase, as we would leave the existing structures in place.
A migration tool will be required to convert the existing tables to the new store tables in the second phase.
Security
The Watch operation allows SPIRE servers to listen for updates to the database, allowing for real-time updates to cache to maintain coherence in multiple server HA deployments for datastore plugins capable of supporting this. This also addresses scenarios where network connection between SPIRE servers is not available.
SPIRE servers also perform full cache reloads on configurable intervals to protect against cache drift. This operation will alert any discovered differences for analysis and potential debug. Over time, the default refresh interval should increase if no drift is detected.
An updated security review should be conducted with/after this refactor.
Failure Scenarios
How does the new system handle:
- failure or unavailability of one or more backend store nodes
- failure or unavailability of one or more HA SPIRE servers
- network segmentation of SPIRE servers and/or store servers
- (more scenarios needed)