Skip to content

Provision destination volume for snapshot blocks #1752

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Oct 4, 2022

Conversation

jmpesp
Copy link
Contributor

@jmpesp jmpesp commented Sep 28, 2022

During snapshot creation, provision a destination volume as an eventual landing place of the snapshot blocks. This will cause snapshots to fail if there isn't space to store all the blocks in regions of their own.

This commit adds an optional destination volume ID to the snapshot model. Once a snapshot is created, some task will have to copy blocks from the source volume to this destination volume, then swap the entries accordingly.

This commit also moves common Crucible agent interaction functions into the saga mod.rs, and generalizes region_allocate's arguments so that it can be called from more than just the disk creation saga.

Closes #1642

During snapshot creation, provision a destination volume as an eventual
landing place of the snapshot blocks. This will cause snapshots to fail
if there isn't space to store all the blocks in regions of their own.

This commit adds an optional destination volume ID to the snapshot
model. Once a snapshot is created, some task will have to copy blocks
from the source volume to this destination volume, then swap the entries
accordingly.

This commit also moves common Crucible agent interaction functions into
the saga mod.rs, and generalizes region_allocate's arguments so that it
can be called from more than just the disk creation saga.

Closes oxidecomputer#1642
@jmpesp jmpesp requested review from andrewjstone and removed request for andrewjstone September 29, 2022 19:50
@jmpesp
Copy link
Contributor Author

jmpesp commented Sep 29, 2022

(That was a misclick, sorry @andrewjstone!)

@jmpesp jmpesp requested a review from davepacheco September 30, 2022 15:19
self.volume_delete(db_snapshot.volume_id).await?;
if let Some(volume_id) = db_snapshot.destination_volume_id {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a couple of these steps maybe need to happen transactionally? If we crash either after L511 or L514, it seems like we'd wind up not having cleaned stuff up.

I can understand punting on this but it seems like we should track this somewhere (even if just a TODO).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a problem, yeah. Nexus::project_delete_snapshot is directly called from an HTTP endpoint, so it's not a case where a saga node would be replayed during a crash. I'm not sure how to solve this, and I feel like this may be a more general problem. I'll give it some thought.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be addressed by #2090

.organization_name(organization_name)
.lookup_for(authz::Action::ListChildren)
.await?;
let (authz_silo, _) = LookupPath::new(opctx, &self.db_datastore)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this earlier and I guess this problem was in the old code too, but this code looks up the same organization by name twice, which is another race. The new code adds a third lookup by organization name and a second lookup by project name. You should be able to do a bunch of these simultaneously to avoid the race, with something like this:

let (authz_silo, authz_org, authz_project, authz_disk) = LookupPath::new(opctx, &self.db_datastore)
    .organization_name(organization_name)
    .project_name(project_name)
    .disk_name(disk_name)
    .lookup_for(...)
    .await?;

@smklein smklein mentioned this pull request Oct 3, 2022
14 tasks
@jmpesp jmpesp merged commit f47bbfd into oxidecomputer:main Oct 4, 2022
@jmpesp jmpesp deleted the fail_snapshot_if_no_space branch October 4, 2022 17:25
smklein added a commit that referenced this pull request Jan 18, 2023
Part of #1734 ,
specifically [this
bit](#1734 (comment)).

This PR adds a table called `resource_usage`, which exists for silos,
organizations, and projects. Currently, it only contains information
about each collection's disk usage.

- [x] API exposure
- [x] Emit this information to Clickhouse (metrics are passed to the
producer on every modification)
- [x] Add a metrics-based API for querying such historical info (done,
under `/system/metrics/resource-utilization`. Happy to update this API
as it's useful, but I went with something minimal for expediency).
- [x] Correctness
  - [x] Add CTE to update all collections up to the root
  - [x] Ensure each query avoid full-table scans
- [x] Ensure that each update of `resource_usage` is atomic (part of a
transaction, saga, or CTE)
- [x] Ensure that the disk usage accounting is accurate. Currently, we
only consider region allocations / deallocations; accurately accounting
for snapshots will require incorporating
#1752.
  - [x] Add integration tests

After merging, I'd like to do the following:
- [ ] Emit some amount of "total capacity" info, to contextualize the
currently used amount. This makes much more sense at a physical view
(sled, rack, fleet) than user view.
- [x] Expand the "collections" to include a "fleet" object. This will be
particularly useful for operators.
- [ ] Make the accounting of "utilization" more accurate. It currently
is not accounting for: Metadata (e.g., crucible's sqlite dbs), system
usage (CRDB, Clickhouse, the OS itself, etc).
- [x] Expand the usage information to account for CPU usage, RAM, and
other globally-shared resources.
- [x] Ensuring idempotency has been punted to
#2094 , though this
*should* work with the new CTEs
@david-crespo david-crespo mentioned this pull request Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Snapshot creation should fail if there isn't room in the dataset
3 participants