Iceberg metrics #82

subkanthi · 2025-12-04T18:36:12Z

No description provided.

…ile.

…errors.

…lly.

subkanthi · 2025-12-11T17:16:42Z

…_metrics

subkanthi · 2025-12-16T13:54:19Z

There are some metrics that ice is not currently tracking

CommitMetrics
totalFilesSizeInBytes
iceberg_commit_added_delete_files_total

shyiko

Looks good overall. Thank you!

shyiko · 2025-12-18T22:49:45Z

ice/src/main/resources/logback.xml

  <!-- hide "Unable to load metrics class: 'org.apache.iceberg.hadoop.HadoopMetricsContext', falling back to null metrics" -->
  <logger name="org.apache.iceberg.aws.s3.S3FileIO" level="ERROR"/>
+  <!-- hide "Unclosed input stream" warnings from Iceberg's S3InputStream finalizer -->
+  <logger name="org.apache.iceberg.aws.s3.S3InputStream" level="ERROR"/>


isn't this an indicator that we're not doing a good job at closing input streams and need to fix that?

This was coming from iceberg API and it was flooding the logs in a loop, I will get the exception

shyiko · 2025-12-18T22:55:32Z

ice/src/main/java/com/altinity/ice/cli/internal/metrics/InsertWatchMetrics.java

+
+  // ==========================================================================
+  // Public methods
+  // ==========================================================================


nit: comments like these should be avoided

shyiko · 2025-12-18T23:00:12Z

ice/src/main/java/com/altinity/ice/cli/internal/metrics/InsertWatchMetrics.java

+  private final Counter messageParseErrorsTotal;
+
+  /** Returns the singleton instance of the metrics reporter. */
+  public static InsertWatchMetrics getInstance() {


nit: why not use IoDH for instance lazy loading (and avoid the lock?

shyiko · 2025-12-18T23:06:26Z

ice/src/main/java/com/altinity/ice/cli/internal/metrics/InsertWatchMetrics.java

+      "Total number of retry attempts due to failures";
+
+  // SQS errors
+  private static final String SQS_RECEIVE_ERRORS_TOTAL_NAME = "ice_watch_sqs_receive_errors_total";


should we perhaps drop sqs from metric names here and below for consistency with metrics above + in case we decide to add support for other message queue?

shyiko · 2025-12-18T23:26:17Z

examples/scratch/README.md

-
+
+# delete partition
+ice delete nyc.taxis_p_by_day   --partition '[{"name": "tpep_pickup_datetime", "values": ["2024-12-31T23:51:20"]}]' --dry-run=false


\ like the rest

shyiko · 2025-12-18T23:30:46Z

...-catalog/src/main/java/com/altinity/ice/rest/catalog/internal/maintenance/OrphanCleanup.java

+      // Record deletion metrics
+      metrics.recordOrphanFilesDeleted(tableName, deletedCount.get());
+      for (int i = 0; i < failedCount.get(); i++) {
+        metrics.recordOrphanDeleteFailure(tableName);


any reason it's not metrics.recordOrphanDeleteFailure(tableName, failedCount.get())?

shyiko · 2025-12-18T23:35:53Z

ice-rest-catalog/src/main/java/com/altinity/ice/rest/catalog/Main.java

    }

+    // Initialize Iceberg metrics reporter for Prometheus (singleton)
+    PrometheusMetricsReporter metricsReporter = PrometheusMetricsReporter.getInstance();


some Metrics objects are passed around (like here), others are not (like MaintenanceMetrics.getInstance()). Let's settle on one approach

shyiko · 2025-12-18T23:39:02Z

ice/src/main/java/com/altinity/ice/cli/internal/cmd/InsertWatch.java

+        builder.endpointOverride(endpoint);
+      }
+    } catch (Exception e) {
+      logger.warn("Failed to parse SQS queue URL for endpoint extraction: {}", e.getMessage());


throw instead of ignore-logging, perhaps?

shyiko · 2025-12-18T23:45:02Z

ice/src/main/java/com/altinity/ice/cli/internal/cmd/InsertWatch.java

+      if (host != null && !host.endsWith(".amazonaws.com")) {
+        URI endpoint = new URI(uri.getScheme(), null, host, uri.getPort(), null, null, null);
+        logger.info("Using custom SQS endpoint: {}", endpoint);
+        builder.endpointOverride(endpoint);


I'd suggest introducing cli option (e.g. --watch-endpoint) that allow to specify a different value from --watch as proposed approach falls apart when using localstack, etc. This way things also can continue to use AWS_ENDPOINT_URL_SQS env var when needed.

shyiko · 2025-12-18T23:47:37Z

ice-rest-catalog/src/main/java/com/altinity/ice/rest/catalog/internal/metrics/HttpMetrics.java

+ */
+package com.altinity.ice.rest.catalog.internal.metrics;
+
+import static com.altinity.ice.rest.catalog.internal.metrics.IcebergMetricNames.*;


https://google.github.io/styleguide/javaguide.html#s3.3.1-wildcard-imports

subkanthi added 7 commits December 3, 2025 13:40

Added iceberg metrics to be exposed in /metrics REST API call.

2504f51

Moved hardcoded strings to IcebergMetricNames, added dashboard json f…

3776ed6

…ile.

Added prometheus HTTP metrics to track request duration and response …

e7859cc

…errors.

Removed extra comments.

6f908f4

Added docker compose for ElasticMQ and configuration to test SQS loca…

32f5adc

…lly.

Added InsertWatchMetrics and updated grafana dashboard.

a6918a3

Added metrics for Maintenance.

18491d0

subkanthi added 5 commits December 11, 2025 12:38

Renamed metrics as it collides with prometheus reserved names.

3ce3466

Update dashboard for metrics.

5d0b79f

Update dashboard for metrics.

99d8d3d

Update dashboard for metrics.

a1edefa

Merge branch 'master' of https://github.com/Altinity/ice into iceberg…

3c7aa44

…_metrics

subkanthi added 6 commits December 16, 2025 12:16

Update dashboard for metrics.

694a633

Update dashboard for metrics.

db27ed9

Update dashboard for metrics.

8bcae99

Added logic to track response size

31532b9

Updated grafana dashboard.

a0c6748

Added documentation for grafana dashboard.

dd2e197

subkanthi marked this pull request as ready for review December 17, 2025 13:54

shyiko requested changes Dec 18, 2025

View reviewed changes

subkanthi added 4 commits December 18, 2025 21:58

Removed wildcard imports, addressed review comments.

f4fd62e

Added cli option to pass SQS override URL.

6930d63

Changed singleton to IoDH

463abb2

Changed singleton to IoDH

34ba594



		# delete partition
		ice delete nyc.taxis_p_by_day --partition '[{"name": "tpep_pickup_datetime", "values": ["2024-12-31T23:51:20"]}]' --dry-run=false

Iceberg metrics #82

Are you sure you want to change the base?

Iceberg metrics #82

Uh oh!

Conversation

subkanthi commented Dec 4, 2025

Uh oh!

subkanthi commented Dec 11, 2025

Uh oh!

subkanthi commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shyiko left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

subkanthi commented Dec 16, 2025 •

edited

Loading