Skip to content

Conversation

@sadanand48
Copy link
Contributor

What changes were proposed in this pull request?

Add metrics to track Snapshot RocksDB space and SST File stats

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14041

How was this patch tested?

Tried out on docker

  "beans" : [ {
    "name" : "Hadoop:service=OzoneManager,name=OMSnapshotDirectoryMetrics",
    "modelerType" : "OMSnapshotDirectoryMetrics",
    "tag.Context" : "Snapshot Directory Metrics",
    "tag.Hostname" : "b8f9c29c90ba",
    "DbSnapshotsDirSize" : 1162516,
    "TotalSstFilesCount" : 36,
    "NumSnapshots" : 3,
    "tag.Context.1" : "Per-Checkpoint Directory Metrics",
    "tag.CheckpointDirName.1" : "om.db-094618c2-8b30-4220-8351-d7d9d9b18cbd",
    "tag.Hostname.1" : "b8f9c29c90ba",
    "CheckpointDirSize.1" : 374766,
    "CheckpointSstFilesCount.1" : 4,
    "tag.Context.2" : "Per-Checkpoint Directory Metrics",
    "tag.CheckpointDirName.2" : "om.db-d7be41db-47e3-4db6-b556-1c72e5a88825",
    "tag.Hostname.2" : "b8f9c29c90ba",
    "CheckpointDirSize.2" : 398392,
    "CheckpointSstFilesCount.2" : 20,
    "tag.Context.3" : "Per-Checkpoint Directory Metrics",
    "tag.CheckpointDirName.3" : "om.db-4ad9186b-bea9-4ed5-b052-1a301e539f3e",
    "tag.Hostname.3" : "b8f9c29c90ba",
    "CheckpointDirSize.3" : 387538,
    "CheckpointSstFilesCount.3" : 12
  } ]
}

@sadanand48 sadanand48 added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Dec 1, 2025
Copilot finished reviewing on behalf of jojochuang December 1, 2025 17:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive metrics tracking for Ozone Manager's snapshot RocksDB directories, monitoring disk space usage and SST file counts. The implementation introduces a new metrics class that periodically collects statistics both at the aggregate level (total snapshots directory size and SST file count) and per-checkpoint directory level, with configurable update intervals.

Key Changes

  • Introduces OMSnapshotDirectoryMetrics class with asynchronous metrics collection using a Timer-based scheduler
  • Adds configuration property ozone.om.snapshot.directory.metrics.update.interval (default: 5 minutes) to control update frequency
  • Integrates metrics lifecycle into OzoneManager's start/restart/stop methods

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
OMSnapshotDirectoryMetrics.java New metrics class implementing periodic collection of snapshot directory size and SST file statistics with both aggregate and per-checkpoint metrics
OzoneManager.java Integrates snapshot directory metrics into OM lifecycle by starting metrics collection on start/restart and stopping on shutdown
OMMetrics.java Adds snapshot directory metrics management methods and field to hold the metrics instance
OMConfigKeys.java Defines configuration key and default value for metrics update interval
ozone-default.xml Adds configuration property documentation for the metrics update interval setting

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant