Skip to content

Add OpenTelemetry instrumentation to IPFS retrieval#7

Open
0xSwego wants to merge 4 commits into
mainfrom
codex/opentelemetry-ipfs-retrieval
Open

Add OpenTelemetry instrumentation to IPFS retrieval#7
0xSwego wants to merge 4 commits into
mainfrom
codex/opentelemetry-ipfs-retrieval

Conversation

@0xSwego

@0xSwego 0xSwego commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • New Features

    • Added OpenTelemetry-based observability for IPFS Zarr dataset retrieval: tracing, metrics, counters, and duration histograms for dataset and store opens.
  • Bug Fixes

    • Improved CID/input validation and clearer mapping of connection-like failures vs other errors; spans record error details and statuses.
  • Tests

    • Added tests for connection-classification and failure-path behavior.
  • Chores

    • Added OpenTelemetry API dependency.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds OpenTelemetry tracing and metrics around IPFS Zarr dataset loading: OTEL setup, metric/span helpers, CID validation, a span-wrapped loader that records per-store and dataset metrics and maps connection-like failures, plus tests for error classification and timeout behavior.

Changes

OTEL Instrumentation for IPFS Dataset Retrieval

Layer / File(s) Summary
OTEL initialization and instrumentation helpers
dclimate_client_py/ipfs_retrieval.py, pyproject.toml
Added opentelemetry-api>=1.30.0. Initialized tracer/meter and dataset/store counters and duration histograms. Implemented OTEL-safe attribute sanitization, gateway attribute extraction from KuboCAS, dataset/store metric recorders, span error recording, _is_connection_error classifier, and a no-op ipfs_instrumentation fallback.
Core dataset loader with OTEL tracing and metrics
dclimate_client_py/ipfs_retrieval.py
Rewrote _load_dataset_from_ipfs_cid to validate CID input, run inside a dataset-level OTEL span, attempt ShardedZarrStore.open then fall back to HAMT.build + ZarrHAMTStore on non-connection failures, record per-store and dataset counters/histograms and span attributes, set ds.attrs["_ipfs_store_type"], and map/raise connection-like errors as IpfsConnectionError.
Tests and minor test-file cleanups
tests/test_ipfs_retrieval.py, tests/test_list_datasets_parity.py, tests/test_stac_server_listing.py
Added tests for connection-error classification and sharded-store timeout behavior with a minimal DummyKuboCAS and fixture; reformatted minor assertions and removed an unused import in existing test modules.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 OpenTelemetry now shines,
Tracing IPFS through the lines,
Counters hum and spans hold tight,
Timeouts flagged in morning light,
a carrot for each metric 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately and concisely describes the main change: adding OpenTelemetry instrumentation to IPFS retrieval, which is the primary focus of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dclimate_client_py/ipfs_retrieval.py`:
- Around line 225-304: The sharded-store except block (the catch "except
Exception as sharded_err" around ShardedZarrStore) must classify connection
failures the same way you do later and raise IpfsConnectionError before
attempting the HAMT fallback; inspect sharded_err for the same
substrings/conditions used to create IpfsConnectionError (e.g., "Connection
refused", "Max retries exceeded", "Timeout") and if matched call
_record_span_error(dataset_span, connection_error) and raise that
connection_error instead of falling back to HAMT; otherwise proceed with the
existing HAMT.build/ZarrHAMTStore fallback, recording metrics via
_record_store_open and setting ds.attrs["_ipfs_store_type"], dataset_status and
dataset_store_type as currently done.
- Around line 183-206: The spans started with _TRACER.start_as_current_span(...)
(e.g., the ShardedZarrStore/HAMT/dataset spans that call _record_span_error in
their except blocks) are currently being double-reported because you both
manually call _record_span_error(span, err) and let the context manager
auto-record exceptions; fix by disabling automatic exception recording on those
spans—call start_as_current_span with record_exception=False and
set_status_on_exception=False for the spans used around ShardedZarrStore.open,
the HAMT open, and the dataset span (where _record_span_error is invoked), so
the manual _record_span_error remains the sole reporter of the exception and
status.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 61dfdaea-2424-4dcc-95ce-9ea11e5b23c2

📥 Commits

Reviewing files that changed from the base of the PR and between 6e72a5f and a7d54c1.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • dclimate_client_py/ipfs_retrieval.py
  • pyproject.toml

Comment thread dclimate_client_py/ipfs_retrieval.py
Comment thread dclimate_client_py/ipfs_retrieval.py
@0xSwego 0xSwego changed the title Add instrumentation Add OpenTelemetry instrumentation to IPFS retrieval Jun 8, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_ipfs_retrieval.py (1)

14-16: 💤 Low value

Consider clarifying or removing the no-op fixture.

The check_ipfs_connection fixture name suggests it validates IPFS availability, but it currently does nothing. Since the tests in this file use mocks and don't require a real IPFS connection, you might:

  • Remove the fixture if it's not needed, or
  • Add a docstring explaining it's a placeholder for future IPFS-dependent tests, or
  • Implement actual IPFS availability checking if tests will later need it
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_ipfs_retrieval.py` around lines 14 - 16, The fixture
check_ipfs_connection is a no-op that misleads readers; either remove the
fixture entirely from tests/test_ipfs_retrieval.py if no real IPFS connectivity
is required, or replace it with a clear placeholder docstring explaining it's
intentionally a stub for future IPFS-dependent tests, or implement an actual
availability check (e.g., attempt a quick ping/HTTP request to the IPFS API and
skip tests on failure) so that the fixture name matches behavior; locate the
check_ipfs_connection fixture in the file and apply one of these three options.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/test_ipfs_retrieval.py`:
- Around line 14-16: The fixture check_ipfs_connection is a no-op that misleads
readers; either remove the fixture entirely from tests/test_ipfs_retrieval.py if
no real IPFS connectivity is required, or replace it with a clear placeholder
docstring explaining it's intentionally a stub for future IPFS-dependent tests,
or implement an actual availability check (e.g., attempt a quick ping/HTTP
request to the IPFS API and skip tests on failure) so that the fixture name
matches behavior; locate the check_ipfs_connection fixture in the file and apply
one of these three options.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 272c060f-723a-4e4d-9cf6-8f9b3b39368d

📥 Commits

Reviewing files that changed from the base of the PR and between a7d54c1 and f1de268.

📒 Files selected for processing (4)
  • dclimate_client_py/ipfs_retrieval.py
  • tests/test_ipfs_retrieval.py
  • tests/test_list_datasets_parity.py
  • tests/test_stac_server_listing.py
✅ Files skipped from review due to trivial changes (2)
  • tests/test_list_datasets_parity.py
  • tests/test_stac_server_listing.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • dclimate_client_py/ipfs_retrieval.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant