-
Notifications
You must be signed in to change notification settings - Fork 54
Description
About Me
This RFC is posted on behalf of NRK
Use Case
Given Sofie's role of producing broadcasts, downtime during the show is not acceptable.
When running in docker in a VM, downtime can usually be limited to maintenance windows. The occasional vm live migration can happen (perhaps going unnoticed?) and occasional restart/crash of docker containers is usually perceived as system slowness, taking usually a couple of seconds to recover.
The current abilities of Sofie are typically happy here, this is what the current system design was targetting.
When running in kubernetes, the system wants to be able to restart running containers freely. This is not good with sofie, as a restart of a container can often take 30s to come back up, which will cause shows to suffer.
Disallowing this is possible but can have side effects such as blocking the cluster from being able to perform maintenance on the kubernetes nodes.
Some system design work is needed to be more kuberenetes friendly
Proposal
This is not a concrete proposal, but intended to be a place to collect ideas. There are no plans on when/if we will tackle anything on this, but we hope that this RFC will allow the community to formulate a plan on how this should be tackled and allow for it to be gradually tackled by anyone who needs it.
The needed solution is to be able to run replicas of every portion of sofie:
- sofie-core
- job-worker
- playout gateway
- ingest gateway (optional?)
- input gateway
- package manager?
- live status gateway
Whether this should be done as primary+spare, load balanced or something else should be figured out
Another of the underlying problems is that within sofie-core there are a few debounces/caches kept in memory that would cause inconsistencies if there were multiple instances running. There has been work to minimise the number of these, but some remain.
Process
The Sofie Team will evaluate this RFC and open up a discussion about it, usually within a week.
- RFC created
- Sofie Team has evaluated the RFC
- A workshop has been planned
- RFC has been discussed in a workshop
- A conclusion has been reached, see comments in thread