Skip to content

Conversation

@eddyashton
Copy link
Member

@eddyashton eddyashton commented Sep 25, 2025

Previously we'd compare versions of snapshots we read from disk, and only take one from the peer if it was more recent. But this still exposes us to file system sync issues with snapshots.

This changes to always take a snapshot from the peer, if one is available.

Since that might mean we already have an identically-named snapshot, we also now overwrite any existing snapshot (with a corresponding log line).

@eddyashton eddyashton marked this pull request as ready for review September 26, 2025 08:45
@eddyashton eddyashton requested a review from a team as a code owner September 26, 2025 08:45
@achamayou achamayou added auto-backport Automatically backport this PR to LTS branch 6.x-todo PRs which should be backported to 6.x labels Sep 29, 2025
@achamayou achamayou enabled auto-merge September 29, 2025 14:37
@achamayou achamayou disabled auto-merge September 29, 2025 15:35
@achamayou achamayou added this pull request to the merge queue Sep 29, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Sep 29, 2025
@cjen1-msft
Copy link
Contributor

So this PR is either hitting or exacerbating the CI instability regarding snapshotting.
I'm currently trying to run many CI runs on a ACI instance, and hopefully find the set of expected failures.
It looks to be currently:

  • Connections test
  • snapshot related things (at the very least the forced snapshot proposals has a bug in our test infra, and there might be others)

@eddyashton eddyashton marked this pull request as draft October 10, 2025 10:09
@eddyashton eddyashton marked this pull request as ready for review October 31, 2025 15:54
Copilot AI review requested due to automatic review settings October 31, 2025 15:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR modifies the snapshot fetching mechanism for joining nodes to always prefer the primary's snapshot over local snapshots, regardless of version. The change simplifies the snapshot selection logic by removing the "since" query parameter that previously filtered snapshots based on local state.

Key changes:

  • Snapshot fetching no longer considers local snapshot versions when requesting from peers
  • Backups redirect to the primary's snapshot when available, falling back to their own only when no primary exists
  • Test coverage expanded with a new snapshot selection test

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/e2e_operations.py Moved force_become_primary to module level, added test_snapshot_selection, updated test calls with new parameters
src/snapshots/fetch.h Removed latest_local_snapshot parameter and "since" query logic from snapshot fetching functions
src/node/rpc/node_frontend.h Added redirect-to-primary logic for non-primary nodes, added snapshot name header to responses
src/host/run.cpp Removed local snapshot version check, changed error to warning when overwriting existing snapshots
include/ccf/http_consts.h Added new HTTP header constant for snapshot name
doc/operations/ledger_snapshot.rst Updated documentation to reflect new snapshot fetching behavior
doc/host_config_schema/cchost_config.json Updated description to remove "newer" qualifier from fetch behavior
CHANGELOG.md Added changelog entry for the behavior change

eddyashton and others added 2 commits November 3, 2025 16:37
Co-authored-by: Amaury Chamayou <amaury@xargs.fr>
@achamayou achamayou enabled auto-merge (squash) November 3, 2025 17:27
@achamayou achamayou merged commit 516fcaa into microsoft:main Nov 3, 2025
18 checks passed
eddyashton added a commit to eddyashton/CCF that referenced this pull request Dec 3, 2025
(cherry picked from commit 516fcaa)

# Conflicts:
#	CHANGELOG.md
#	src/host/run.cpp
#	src/node/rpc/node_frontend.h
#	tests/e2e_operations.py
@eddyashton eddyashton added the backported This PR was successfully backported to LTS branch label Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.x-todo PRs which should be backported to 6.x auto-backport Automatically backport this PR to LTS branch backported This PR was successfully backported to LTS branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants