Provision destination volume for snapshot blocks #1752

jmpesp · 2022-09-28T20:00:39Z

During snapshot creation, provision a destination volume as an eventual landing place of the snapshot blocks. This will cause snapshots to fail if there isn't space to store all the blocks in regions of their own.

This commit adds an optional destination volume ID to the snapshot model. Once a snapshot is created, some task will have to copy blocks from the source volume to this destination volume, then swap the entries accordingly.

This commit also moves common Crucible agent interaction functions into the saga mod.rs, and generalizes region_allocate's arguments so that it can be called from more than just the disk creation saga.

Closes #1642

During snapshot creation, provision a destination volume as an eventual landing place of the snapshot blocks. This will cause snapshots to fail if there isn't space to store all the blocks in regions of their own. This commit adds an optional destination volume ID to the snapshot model. Once a snapshot is created, some task will have to copy blocks from the source volume to this destination volume, then swap the entries accordingly. This commit also moves common Crucible agent interaction functions into the saga mod.rs, and generalizes region_allocate's arguments so that it can be called from more than just the disk creation saga. Closes oxidecomputer#1642

jmpesp · 2022-09-29T19:50:51Z

(That was a misclick, sorry @andrewjstone!)

davepacheco · 2022-09-30T16:04:58Z

nexus/src/app/disk.rs

        self.volume_delete(db_snapshot.volume_id).await?;
+        if let Some(volume_id) = db_snapshot.destination_volume_id {


It seems like a couple of these steps maybe need to happen transactionally? If we crash either after L511 or L514, it seems like we'd wind up not having cleaned stuff up.

I can understand punting on this but it seems like we should track this somewhere (even if just a TODO).

This is a problem, yeah. Nexus::project_delete_snapshot is directly called from an HTTP endpoint, so it's not a case where a saga node would be replayed during a crash. I'm not sure how to solve this, and I feel like this may be a more general problem. I'll give it some thought.

This should be addressed by #2090

nexus/src/app/disk.rs

nexus/tests/integration_tests/volume_management.rs

nexus/tests/integration_tests/snapshots.rs

nexus/src/app/sagas/mod.rs

nexus/src/app/sagas/snapshot_create.rs

common/src/sql/dbinit.sql

…e.rs

…not need to check

davepacheco · 2022-10-03T17:54:42Z

nexus/src/app/disk.rs

-                .organization_name(organization_name)
-                .lookup_for(authz::Action::ListChildren)
-                .await?;
+        let (authz_silo, _) = LookupPath::new(opctx, &self.db_datastore)


Sorry I missed this earlier and I guess this problem was in the old code too, but this code looks up the same organization by name twice, which is another race. The new code adds a third lookup by organization name and a second lookup by project name. You should be able to do a bunch of these simultaneously to avoid the race, with something like this:

let (authz_silo, authz_org, authz_project, authz_disk) = LookupPath::new(opctx, &self.db_datastore) .organization_name(organization_name) .project_name(project_name) .disk_name(disk_name) .lookup_for(...) .await?;

Part of #1734 , specifically [this bit](#1734 (comment)). This PR adds a table called `resource_usage`, which exists for silos, organizations, and projects. Currently, it only contains information about each collection's disk usage. - [x] API exposure - [x] Emit this information to Clickhouse (metrics are passed to the producer on every modification) - [x] Add a metrics-based API for querying such historical info (done, under `/system/metrics/resource-utilization`. Happy to update this API as it's useful, but I went with something minimal for expediency). - [x] Correctness - [x] Add CTE to update all collections up to the root - [x] Ensure each query avoid full-table scans - [x] Ensure that each update of `resource_usage` is atomic (part of a transaction, saga, or CTE) - [x] Ensure that the disk usage accounting is accurate. Currently, we only consider region allocations / deallocations; accurately accounting for snapshots will require incorporating #1752. - [x] Add integration tests After merging, I'd like to do the following: - [ ] Emit some amount of "total capacity" info, to contextualize the currently used amount. This makes much more sense at a physical view (sled, rack, fleet) than user view. - [x] Expand the "collections" to include a "fleet" object. This will be particularly useful for operators. - [ ] Make the accounting of "utilization" more accurate. It currently is not accounting for: Metadata (e.g., crucible's sqlite dbs), system usage (CRDB, Clickhouse, the OS itself, etc). - [x] Expand the usage information to account for CPU usage, RAM, and other globally-shared resources. - [x] Ensuring idempotency has been punted to #2094 , though this *should* work with the new CTEs

jmpesp added 2 commits September 28, 2022 15:51

fmt

110623e

jmpesp requested review from andrewjstone and removed request for andrewjstone September 29, 2022 19:50

jmpesp requested a review from davepacheco September 30, 2022 15:19

davepacheco reviewed Sep 30, 2022

View reviewed changes

jmpesp added 5 commits October 3, 2022 11:21

use ByteCount::from_gibibytes_u32 instead of ByteCount::try_from

d2d3aef

move functions that interact with the Crucible agent to common_storag…

a554ef5

…e.rs

operate on disk id, not disk name, in snapshot create saga

89d30cb

if a disk is modifed during the saga, the saga will fail anyway - do …

4ebe297

…not need to check

Merge branch 'main' into fail_snapshot_if_no_space

c6358a7

davepacheco reviewed Oct 3, 2022

View reviewed changes

smklein mentioned this pull request Oct 3, 2022

Resource Utilization #1782

Merged

14 tasks

one LookupPath to avoid grabbing stale entries

6ed3354

davepacheco approved these changes Oct 4, 2022

View reviewed changes

jmpesp merged commit f47bbfd into oxidecomputer:main Oct 4, 2022

jmpesp deleted the fail_snapshot_if_no_space branch October 4, 2022 17:25

david-crespo mentioned this pull request Sep 7, 2023

Bump web console #4046

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provision destination volume for snapshot blocks #1752

Provision destination volume for snapshot blocks #1752

Uh oh!

jmpesp commented Sep 28, 2022

Uh oh!

jmpesp commented Sep 29, 2022

Uh oh!

davepacheco Sep 30, 2022

Uh oh!

jmpesp Sep 30, 2022

Uh oh!

smklein Dec 22, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davepacheco Oct 3, 2022

Uh oh!

Uh oh!

		self.volume_delete(db_snapshot.volume_id).await?;
		if let Some(volume_id) = db_snapshot.destination_volume_id {

Provision destination volume for snapshot blocks #1752

Provision destination volume for snapshot blocks #1752

Uh oh!

Conversation

jmpesp commented Sep 28, 2022

Uh oh!

jmpesp commented Sep 29, 2022

Uh oh!

davepacheco Sep 30, 2022

Choose a reason for hiding this comment

Uh oh!

jmpesp Sep 30, 2022

Choose a reason for hiding this comment

Uh oh!

smklein Dec 22, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davepacheco Oct 3, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!