Skip to content

Feature Request: Support Tiered Storage for Upsert Tables with MetadataTTL Constraint #17686

@johnsolomonj

Description

@johnsolomonj

Problem Statement

Upsert tables currently cannot use tiered storage, even when segments beyond the upsert window are effectively immutable and could safely be moved to cold storage.

// Current validation in TableConfigUtils.java:779-780
Preconditions.checkState(tableConfig.getTierConfigsList() == null || tableConfig.getTierConfigsList().isEmpty(),
    "Tiered storage is not supported for Upsert/Dedup tables");

This blanket prohibition prevents cost optimization for upsert tables with time-bounded upsert windows (configured via metadataTTL).

Proposed Solution

Allow tiered storage for upsert tables when metadataTTL < minSegmentAge, ensuring segments only move to cold tier after their validDocIds bitmaps have been frozen.

Example Configuration:

  • metadataTTL: 3 days (upsert window)
  • minSegmentAge: 7 days (tier boundary)
  • Result: Segments 7+ days old can safely move to cold tier since they haven't received upserts for 4 days

Why This Is Safe

  1. Bitmaps freeze after TTL: Once a segment passes metadataTTL, the metadata manager stops tracking it and its validDocIds bitmap never receives updates again (see isOutOfMetadataTTL() in BasePartitionUpsertMetadataManager)

  2. No write conflicts: Expired segments are removed from _trackedSegments and _primaryKeyToRecordLocationMap, so no upsert operations will attempt to modify their bitmaps

  3. Bitmap storage model supports this: Bitmaps are persisted with segments as validdocids.bitmap.snapshot files and move with the segment to cold tier. Queries only need read access.

  4. Proven pattern: Dedup tables already support this exact approach (added in PR Make dedup table use StrictRealtimeSegmentAssignment with support of multi tiers  #17154, commit ccc41ea8e5):

    // From validateTTLAndTierConfigsForDedupTable()
    Preconditions.checkState(ttlInMs < minSegmentAgeInMs,
        "MetadataTTL: %s(ms) must be smaller than the minimum segmentAge: %s(ms)",
        ttlInMs, minSegmentAgeInMs);

Implementation Approach

  1. Update validation in TableConfigUtils.validateUpsertAndDedupConfig() to allow tiered configs when TTL constraint is met
  2. Add validateTTLAndTierConfigsForUpsertTable() method similar to existing dedup validation
  3. Update segment assignment policy to use multi-tier assignment when constraint is satisfied
  4. Require metadataTTL > 0 when tiered storage is configured for upsert tables

Benefits

  • Storage cost reduction: Move cold segments to cheaper storage tiers
  • No correctness impact: Frozen segments guarantee query correctness
  • Consistent with dedup: Uses same validation pattern already proven in production
  • Backward compatible: Existing upsert tables without tiered configs are unaffected

References

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions