-
Notifications
You must be signed in to change notification settings - Fork 50
Description
I'm trying an upgrade on a4x2 and tuf_artifact_replication seems not to be fully working, or is going slowly. I was still able to set the target release and proceed with the upgrade. The system behaves fairly well -- it just gets stuck executing a blueprint with something like this:
note: using Nexus URL http://[fd00:1122:3344:101::6]:12221
task: "blueprint_executor"
configured period: every 1m
currently executing: iter 83, triggered by a periodic timer firing
started at 2025-06-11T20:19:47.485Z, running for 26067ms
last completed activation: iter 82, triggered by a periodic timer firing
started at 2025-06-11T20:18:47.489Z (86s ago) and ran for 51024ms
target blueprint: 7ff85197-6be4-482c-8ded-c8b114ca07eb
execution: enabled
status: completed (14 steps)
warning: at: Deploy sled configs: Failed to put OmicronSledConfig {
generation: Generation(
9,
),
...
} to sled 2d190199-1a3a-419c-8f07-13e00352306e: Error Response: status: 400 Bad Request; headers: {"content-type": "application/json", "x-request-id": "c62efedb-0d2f-47f0-90ef-0edb51c90944", "content-length": "210", "date": "Wed, 11 Jun 2025 20:18:51 GMT"}; value: Error { error_code: None, message: "sled config failed artifact store existence checks: Artifact be6aab2e39fcf5882e94e749ddd394eae45a322deac0528edae8143f9d53fed5 not found", request_id: "c62efedb-0d2f-47f0-90ef-0edb51c90944" }
error: (none)
Eventually in at least some cases so far the artifact does show up and then execution succeeds and the upgrade continues. So the system is handling it about as well as it can, but I imagine we want to prevent you from starting an upgrade when the artifacts aren't replicated everywhere.
This is admittedly tricky -- you could add a sled in the middle of an upgrade and that shouldn't stop it. And we probably would need to be able to override this check if we've got some busted sled. But if we just check this at the point where you set the target release, that might be useful.