Switch to Kueue service backend
Summary
This release switches to running Twined services on Kubernetes + Kueue. This brings the following features:
- Queue questions so they're not just dropped if the service backend is overwhelmed
- Run questions that take any amount of time (specifically opening us up to runs > 1 hour)
- Request arbitrary compute resources per question (CPU, memory, storage etc.)
- Stop extraneous question reruns by allowing us to control when we acknowledge question events
- Monitor running questions individually
- Make it easier to run questions on providers other than Google in the future (i.e. on any Kubernetes cluster)
Contents (#723)
IMPORTANT: There are breaking changes.
New features
- #709 (see PR for list of breaking changes)
- Authenticate requests to service registries
Enhancements
- Add
allow_not_found
option toServiceConfiguration.from_file
- Add default event store ID to
get_events
- Increase default maximum heartbeat interval to 360s
Refactoring
Dependencies
- Remove
gunicorn
andFlask
dependencies
Operations
- Replace Terraform configuration with new
terraform-octue-twined-core
module
Testing
- Move deployment test to
octue/example-service-kueue
repository