-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Problem Statement
Upsert tables currently cannot use tiered storage, even when segments beyond the upsert window are effectively immutable and could safely be moved to cold storage.
// Current validation in TableConfigUtils.java:779-780
Preconditions.checkState(tableConfig.getTierConfigsList() == null || tableConfig.getTierConfigsList().isEmpty(),
"Tiered storage is not supported for Upsert/Dedup tables");This blanket prohibition prevents cost optimization for upsert tables with time-bounded upsert windows (configured via metadataTTL).
Proposed Solution
Allow tiered storage for upsert tables when metadataTTL < minSegmentAge, ensuring segments only move to cold tier after their validDocIds bitmaps have been frozen.
Example Configuration:
metadataTTL: 3 days (upsert window)minSegmentAge: 7 days (tier boundary)- Result: Segments 7+ days old can safely move to cold tier since they haven't received upserts for 4 days
Why This Is Safe
-
Bitmaps freeze after TTL: Once a segment passes
metadataTTL, the metadata manager stops tracking it and itsvalidDocIdsbitmap never receives updates again (seeisOutOfMetadataTTL()inBasePartitionUpsertMetadataManager) -
No write conflicts: Expired segments are removed from
_trackedSegmentsand_primaryKeyToRecordLocationMap, so no upsert operations will attempt to modify their bitmaps -
Bitmap storage model supports this: Bitmaps are persisted with segments as
validdocids.bitmap.snapshotfiles and move with the segment to cold tier. Queries only need read access. -
Proven pattern: Dedup tables already support this exact approach (added in PR Make dedup table use StrictRealtimeSegmentAssignment with support of multi tiers #17154, commit
ccc41ea8e5):// From validateTTLAndTierConfigsForDedupTable() Preconditions.checkState(ttlInMs < minSegmentAgeInMs, "MetadataTTL: %s(ms) must be smaller than the minimum segmentAge: %s(ms)", ttlInMs, minSegmentAgeInMs);
Implementation Approach
- Update validation in
TableConfigUtils.validateUpsertAndDedupConfig()to allow tiered configs when TTL constraint is met - Add
validateTTLAndTierConfigsForUpsertTable()method similar to existing dedup validation - Update segment assignment policy to use multi-tier assignment when constraint is satisfied
- Require
metadataTTL > 0when tiered storage is configured for upsert tables
Benefits
- Storage cost reduction: Move cold segments to cheaper storage tiers
- No correctness impact: Frozen segments guarantee query correctness
- Consistent with dedup: Uses same validation pattern already proven in production
- Backward compatible: Existing upsert tables without tiered configs are unaffected
References
- Dedup multi-tier support: PR Make dedup table use StrictRealtimeSegmentAssignment with support of multi tiers #17154 (commit
ccc41ea8e5) - Validation logic:
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java - Bitmap freezing:
BasePartitionUpsertMetadataManager.isOutOfMetadataTTL()