Skip to content

restrict Clickhouse schema changes #8862

@davepacheco

Description

@davepacheco

Determinations from the 2025-08-05 update sync regarding Clickhouse schema update:

  • For the time being, we will not automate Clickhouse schema update as part of self-service upgrade.
  • To prevent us from accidentally breaking the schema in the meantime, we will add a CI check that the schema doesn't change (like an expectorate test, but harder to override, and with a note explaining the situation).

This ticket covers adding the CI check.

Impact

  • Each time we want to change the Clickhouse schema, we'll will need to analyze the impact on online upgrade.
  • Some new dev work will be required next time we have to do a schema update (to get it actually applied during the upgrade).

We believe the most common reason to update the Clickhouse schema is to add new field types. That that particular type of schema upgrade will already work correctly across an online upgrade, but work is still needed to get it applied during the upgrade.

Rationale

Background:

  • Clickhouse schema updates have been pretty rare.
  • Schema updates are not required to add new metrics or new fields to existing metrics.
  • The most common reason is to add a new field type. This type of schema update should "just work" today.
  • It is hard to generalize ahead of time about the upgrade impact from future schema updates since we don't know what they will do.

Since it's hard to know automatically what to do without specific planned updates in mind, and we don't expect any anytime soon, we've opted to not handle this automatically and instead require that we reconsider this the next time we have a Clickhouse schema update. The CI check is intended to help us know when we need to consider this so we don't stumble into breakage.

Some other observations made during the discussion (recording for posterity -- not sure how useful they are):

  • We have an updater tool similar to the CockroachDB schema updater.
  • We already assume exactly one Oximeter.
  • The querying side of metrics is tightly coupled to the database.
  • Clickhouse is not strongly consistent. Schema changes must be applied to all nodes and different nodes can see different schemas at a given time.
  • Currently, it’s Clickhouse Admin that updates the schema (see Make the responsibility of initialising oximeter schema clearer #7488).
  • Clickhouse admin has an “init db” endpoint called by RSS (and Nexus? Updater CLI tool?).
  • Historically, these schema changes are pretty rare. We've had 10 all-time. The last one was Feb. 2025.
  • The most common reason for updating the schema is adding new data type. Should already behave fine across an update.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions