Skip to content

RFC: Sofie high availability #1582

@Julusian

Description

@Julusian

About Me

This RFC is posted on behalf of NRK

Use Case

Given Sofie's role of producing broadcasts, downtime during the show is not acceptable.

When running in docker in a VM, downtime can usually be limited to maintenance windows. The occasional vm live migration can happen (perhaps going unnoticed?) and occasional restart/crash of docker containers is usually perceived as system slowness, taking usually a couple of seconds to recover.
The current abilities of Sofie are typically happy here, this is what the current system design was targetting.

When running in kubernetes, the system wants to be able to restart running containers freely. This is not good with sofie, as a restart of a container can often take 30s to come back up, which will cause shows to suffer.
Disallowing this is possible but can have side effects such as blocking the cluster from being able to perform maintenance on the kubernetes nodes.
Some system design work is needed to be more kuberenetes friendly

Proposal

This is not a concrete proposal, but intended to be a place to collect ideas. There are no plans on when/if we will tackle anything on this, but we hope that this RFC will allow the community to formulate a plan on how this should be tackled and allow for it to be gradually tackled by anyone who needs it.

The needed solution is to be able to run replicas of every portion of sofie:

  • sofie-core
  • job-worker
  • playout gateway
  • ingest gateway (optional?)
  • input gateway
  • package manager?
  • live status gateway

Whether this should be done as primary+spare, load balanced or something else should be figured out


Another of the underlying problems is that within sofie-core there are a few debounces/caches kept in memory that would cause inconsistencies if there were multiple instances running. There has been work to minimise the number of these, but some remain.

Process

The Sofie Team will evaluate this RFC and open up a discussion about it, usually within a week.

  • RFC created
  • Sofie Team has evaluated the RFC
  • A workshop has been planned
  • RFC has been discussed in a workshop
  • A conclusion has been reached, see comments in thread

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions