Skip to content

Prevent Nexus from initializing until CRDB is populated #712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Mar 1, 2022

Stops Nexus from booting until CRDB appears initialized, at least once.

This change allows other bootstrapping aspects of the control plane to initialize Nexus and CRDB concurrently, as Nexus will await CRDB formatting before launching itself.

@smklein smklein requested a review from davepacheco March 1, 2022 23:06
Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not sure if you're still looking for a review here. I think we discussed offline having RSS not start Nexus until database populate was done. I think that's a lot cleaner.)

To record some thoughts that we talked about here:

  • This is probably fine
  • It sort of feels consistent with the general practice of having things come online in a degraded mode until their deps are available, rather than, say, aborting on startup and requiring human intervention.
  • I think this case is different from that general case because: a dependency going offline is an anticipated condition that can always happen and so has to be handled at runtime. Having Nexus come up with an unpopulated database can be avoided altogether.
  • Makes me a little uneasy: this change makes some assumptions load-bearing that were previously sort of accidental: that "dbinit.sql" will always set the "version" metadata and that that will always be the last thing that it does. It's not easy to enforce this.
  • I think the property we want to check is "the schema matches something that we're capable of operating on". That seems like a much harder problem.
  • But as I said above, these are vague fears and if this makes things a lot simpler, it's probably fine. But if we can instead guarantee that the database is populated before we start Nexus, that feels cleaner and more robust.

value STRING(1023) NOT NULL
value STRING(1023) NOT NULL,

PRIMARY KEY(name, value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep this: I think we only want the "name" in the primary key.

@smklein
Copy link
Collaborator Author

smklein commented Mar 3, 2022

  • I think the property we want to check is "the schema matches something that we're capable of operating on". That seems like a much harder problem.

I think this is potentially another check we should consider - I actually think Nexus validating that the tables-in-CRDB can be parsed might not be a bad idea to do explicitly. But this can be punted when we have more experience with upgrade / schema migrations.

  • But as I said above, these are vague fears and if this makes things a lot simpler, it's probably fine. But if we can instead guarantee that the database is populated before we start Nexus, that feels cleaner and more robust.

Yeah, the workaround I'm using in the package-zone branch is to "block on initialize storage before starting services". This is a pattern we'll need to carry forward to the RSS, but should resolve the problem

@smklein smklein closed this Mar 3, 2022
@smklein smklein deleted the nexus-dep-on-crdb branch December 27, 2022 19:41
gjcolombo added a commit that referenced this pull request Jun 18, 2024
Update to Propolis commit 50cb28f5 to obtain the following updates:

```
50cb28f server: separate statuses of inbound and outbound migrations (#661)
61e1481 Don't return None from BlockData::block_for_req when attachment is paused (#711)
79e243b Build and clippy fixes for Rust 1.79 (#712)
3c0f998 Modernize fw_cfg emulation
7a65be9 Update and prune dependencies
```

50cb28f5 changes the Propolis instance status and migration status APIs
to report both inbound and outbound migration status. Update sled agent
accordingly.

Tests: `cargo nextest`, smoke test on a dev cluster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants