Skip to content

[MongoDB Storage] Storage version#487

Draft
rkistner wants to merge 10 commits intomainfrom
storage-version
Draft

[MongoDB Storage] Storage version#487
rkistner wants to merge 10 commits intomainfrom
storage-version

Conversation

@rkistner
Copy link
Contributor

@rkistner rkistner commented Feb 4, 2026

This introduces a "storage_version" config, initially for MongoDB storage. If this works well, we can extend the same concept to Postgres storage.

The basic idea is to move away from migrations that need to be run upfront, to a storage version that is specific to a sync rules version. So when upgrading the service version, there is no need to run large migrations on existing data. Instead, when you deploy a new sync rules version, the new storage version is used for that.

Pros:

  1. We can make significant changes to large collections such as bucket_data, without running any expensive migrations upfront.
  2. There is no conflict between service versions before/after the migration. For example, if the migration runs before running the new service version, the old version had to be compatible with the post-migration data, otherwise it would cause issues.
  3. Storage version downgrades are possible, by first deploying sync rules with an older storage version (not implemented yet).

Cons:

  1. We need to support older storage versions for a significant period. We can eventually deprecate old storage versions in new major releases.

This initial implementation uses the storage version for two features:

  1. A guarantee that checksums are always stored as Long, which gives a small performance improvement for checksum calculations. See Fix checksum calculations in large buckets with > 4m rows #282 for context on the original issue.
  2. Auto-enabling versioned bucket names, instead of requiring an opt-in in the sync rules file.

These are fairly minor storage changes to start with. However, the plan is to use this for incremental reprocessing (#468), which may introduce much larger storage changes.

Collections

There are no actual changes to collections here, but generally going forward, we'd have "static" collections (no change based on version), and "versioned" collections - where we'd use different collections based on the storage version, if affected by the storage version. This is not final - we can always introduce more changes based on storage_version, but this generally explains the expected changes.

Still TBD how we manage changes to static collections. For now we can keep on using migrations, but we may eventually replace that mechanism with something that can handle downgrades better.

Static collections:

  1. migrations
  2. instance
  3. sync_rules (the actual fields used may change based on storage version)
  4. locks
  5. op_id_sequence (some of this usage may change based on storage version)
  6. connection_report_events (TBD)

Versioned collections:

  1. bucket_data
  2. bucket_parameters
  3. bucket_state
  4. current_data
  5. checkpoint_events
  6. source_tables
  7. write_checkpoints
  8. custom_write_checkpoints

Downgrades

A downgrade to a lower storage version would always need a sync rule reprocessing. In theory, this can happen before or after downgrading the service version (to one that doesn't support the latest storage version). So you generally have these options:

1. Don't upgrade storage version

Upgrade service version, but don't upgrade storage version. Can downgrade without issues.

The caveat is that storage version upgrade is currently automatic when you update the sync rules - can't opt out of that yet. We can add support for opting out in the future.

2. Downgrade storage before downgrading service.

  1. Upgrade service.
  2. Update sync rules, triggering storage version upgrade.
  3. Realize you want to downgrade.
  4. Update sync rules, with lower storage version. Not supported yet, but we can support this in the future
  5. Downgrade service, with no downtime.

3. Downgrade storage after downgrading service

  1. Upgrade service.
  2. Update sync rules, triggering storage version upgrade.
  3. Downgrade service.
  4. Storage version unsupported - replication and API processes fail, causes downtime.
  5. Re-process sync rules, resulting in the lower storage version. We can consider triggering this automatically, but have to think through the implications.
  6. Once reprocessing completed, users can sync again.

4. Downgrading to lower migration version

Just for reference

Currently this blocks the migration process. Theoretically it's possible to down-migrate first using the newer service version, but I'm not sure that process actually works. Not supported at all on the cloud service at the moment - have to stop and start the instance, which re-creates the storage from scratch and causes downtime.

Depending on the actual migrations, there may be consistency issues in the process.

General comments

There is remaining work to implement to make downgrading possible without downtime. We'd have to think how exactly we expose this - for example config options, or an API. Either approach would likely require documenting storage version compatibility.

However, the current state with storage is no worse than we have with the migration system.

@changeset-bot
Copy link

changeset-bot bot commented Feb 4, 2026

⚠️ No Changeset found

Latest commit: f771dcd

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@rkistner
Copy link
Contributor Author

rkistner commented Feb 4, 2026

@simolus3 @stevensJourney It's not urgent to get this out, but I'd like to get your input on this approach for managing storage versions, in preparation for changes we'd need for incremental reprocessing.

Copy link
Collaborator

@stevensJourney stevensJourney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this approach makes sense and looks good to me.

expires_at: Date;
} | null;

storage_version?: number;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought this might not need to be optional, due to it being set in the migrations. But I assume we can't really guarantee that all migrations have actually been executed in some circumstances - like self-hosted environments, or is there another reason for declaring it as optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially this was from before having a migration. Now, it's mostly a case of we're not guaranteed that the migration has run, and having the fallback is simple to implement.

export const LEGACY_STORAGE_VERSION = 1;
export const CURRENT_STORAGE_VERSION = 2;

export const STORAGE_VERSIONS: Record<number, StorageConfig | undefined> = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It looks like this is mapping the storage version to the corresponding StorageConfig. Maybe we could call this STORAGE_VERSION_CONFIGS.

const storageConfig = STORAGE_VERSIONS[this.storage_version];
if (storageConfig == null) {
throw new ServiceError(
ErrorCode.PSYNC_S1403,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would the flow be if a user did downgrade the service and this has been reached?

Would we always recommend performing a sync rules change when downgrading the service? Otherwise, it seems like a downgrade would essentially take-down the instance for both replication and api services (if I understand this correctly)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct - a downgrade would take down the instance. I feel that's better than attempting to continue, which could result in obscure errors or even silent consistency issues.

I added a section in the PR description on the available downgrade options.

Copy link
Contributor

@simolus3 simolus3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main question is whether we perhaps want to define "common" storage versions in a shared package to reduce duplication once we implement this for Postgres.

Some aspects of storage versions (like storing checksums as long values in this case) are specific to the bucket storage implementation, but others (versioned bucket ids) are essentially a "version of the sync service used in this deployment" field that could be shared.

Let's say we had something like this in service-core:

/**
 * Changes that can be enabled when deploying new sync configurations, but must be preserved for existing deployments.
 */
export interface CommonStorageConfig {
  /**
   * Whether versioned bucket names are automatically enabled.
   *
   * If this is false, bucket names may still be versioned depending on the sync config.
   */
  versionedBuckets: boolean;
}

export const COMMON_STORAGE_CONFIG_LEGACY: CommonStorageConfig = Object.freeze({versionedBuckets: false});
export const COMMON_STORAGE_CONFIG_V1: CommonStorageConfig = Object.freeze({versionedBuckets: true});

In module-mongodb-storage, each StorageConfig would then have a common: CommonStorageConfig field pointing to the corresponding constant defined in service-core.

This would allow us to make CommonStorageConfig a field on the PersistedSyncRulesContent interface (Postgres would unconditionally use COMMON_STORAGE_CONFIG_LEGACY for now). I'm mainly suggesting this in anticipation of #498, but since we will end up having something like the versionedBuckets option across all storage implementations anyway, it feels right to put that into a shared package.

* Hydrate the sync rule definitions with persisted state into runnable sync rules.
*
* @param params.hydrationState Transforms bucket ids based on persisted state. May omit for tests.
* @param params.hydrationState Transforms bucket ids based on persisted state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also mention that the compatibility option is not checked in this method.

import { DateTimeValue, SqlSyncRules, TimeValuePrecision, toSyncRulesValue } from '../../src/index.js';

import { versionedHydrationState } from '../../src/HydrationState.js';
import { DEFAULT_HYDRATION_STATE, versionedHydrationState } from '../../src/HydrationState.js';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove versioned_bucket_ids: false from the streams can disable new format test and remove the can use versioned bucket ids test, these tests are no longer relevant now that the compatibility option is ignored.

Maybe we could replace them with tests asserting that the option is marked as enabled via expect(rules.compatibility.isEnabled(CompatibilityOption.versionedBucketIds)) depending on YAML since we're using that property when loading sync rules from storage.

@rkistner
Copy link
Contributor Author

My main question is whether we perhaps want to define "common" storage versions in a shared package to reduce duplication once we implement this for Postgres.

Some aspects of storage versions (like storing checksums as long values in this case) are specific to the bucket storage implementation, but others (versioned bucket ids) are essentially a "version of the sync service used in this deployment" field that could be shared.

I'm starting to think a common storage version sequence can help. We'll likely expose the storage version to the developer, since they need to be aware of that for certain upgrades or downgrades. And it would make documentation a lot simpler if we can refer to "storage version 5" instead of "storage version 5 for mongodb, 3 for postgres". The same could apply to the cloud dashboard, where the developer should not have to care whether it's a mongodb or postgres storage version that they're seeing/specifying.

I don't think it matters than much that certain storage version features are only applicable on one of the implementations. We can always bump the storage version for both, even if it only really affects one of them.

One implication in practice is we do something like the "versioned bucket names" change only for MongoDB here, we'd either have Postgres not support that storage version yet, or we need to be more pro-active in adding support for Postgres as well.

I'll see if I can update the PR to use common storage versions, and use it for Postgres as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants