Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify diesel error conversions under 'public_error_from_diesel_pool' #647

Merged
merged 6 commits into from
Jan 29, 2022

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Jan 27, 2022

This follows-up a conversation from #644 , and exposes only a single function for performing the error conversion. An enum named OpKind, indicating which operation was issued against the DB, allows callers to supply additional context to their errors.

@smklein
Copy link
Collaborator Author

smklein commented Jan 27, 2022

The call-sites are slightly wordier, but I personally find this easier to understand - if I wanna know which conversion to use, I can look at the single OpKind enum and know what to use. Also, extending that enum to provide more informative errors would be pretty straightforward, and immediately align with the other ops.

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! I haven't looked at the DataStore changes yet. I want to take a close look at those because I know from my own experience how tricky those can be (though hopefully the changes were pretty mechanical).

All of my suggestions here are around naming and documentation. I don't want to be overly nitty but the point of the change is to guide people towards the right facilities so I think it's important that these be pretty clear.

nexus/src/authz/mod.rs Outdated Show resolved Hide resolved
nexus/src/db/error.rs Outdated Show resolved Hide resolved
nexus/src/db/error.rs Outdated Show resolved Hide resolved
nexus/src/db/error.rs Outdated Show resolved Hide resolved
nexus/src/db/error.rs Show resolved Hide resolved
nexus/src/db/error.rs Show resolved Hide resolved
/// The operation was attempting to create a resource with a name.
Create(ResourceType, &'a str),
/// All other operation types.
Other,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make this more specific. I'm still not sure what to call it, but if you see "Other", you might think that if none of the other variants meets your needs, you should pick this one. That's not correct. This variant specifically means "nothing under the user's control can cause this query to fail, so there will be no 400-level errors from this. Any unknown query problem is a bug, and so a 500".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I could call this OpKind::Infallible, but that still seems misleading - it can fail, in the same way the public_error_from_diesel_pool_shouldnt_fail could. I think part of the weirdness comes from the other variants being phrased in terms of the input operation kind (lookup, create) and this one being phrased in terms of the expected error types. If we still think of this case in terms of input operation, it really is a catch-all for "something that isn't all the other cases", and if we don't want that to return exclusively 500 errors, a different variant should be added.

I'll at least update the comment indicating the expected usage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think part of the weirdness comes from the other variants being phrased in terms of the input operation kind (lookup, create) and this one being phrased in terms of the expected error types.

In that case, maybe Other should be called UserInputValidated?

Ooh, what about this: change them all to be named based on the error type rather than the operation or input. That makes more sense next to the data that each variant contains, too. Then it's not really an OpKind either...maybe it's:

enum UserErrorsExpected {
    NotFoundByResource(&dyn authz::ApiResourceError),
    NotFoundByLookup(ResourceType, LookupType),
    Conflict(ResourceType, &str),
    None,
}

If we still think of this case in terms of input operation, it really is a catch-all for "something that isn't all the other cases"

I think that inference is incorrect and that's exactly what I'm worried about people thinking by calling it Other. Today, Other happens to correspond to all the cases that aren't covered by the other three. But you could say that about any of the other three, too. I'd argue it's not a catch-all, in that it would be wrong to assume that it's always a "safe" choice or that it would be correct for some new query that you're adding. To be more concrete, if a developer adds some new query to the system, I'm afraid they're going to pick Other because their case isn't one of the first three. What I want is for the developer to stop and ask whether some other 400-level error is possible and whether a new variant should be created.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda like that re-framing. I have thought that it's been odd we say a "Lookup Expecting 404" is OpKind::Lookup, but a "lookup not expecting a 404" would be OpKind::Other instead of `OpKind::Lookup, which doesn't seem obvious - it's still a lookup!

I'm tweaking those names slightly:

  • I'm using ErrorHandler instead of UserErrorsExpected - the variant can distinguish whether the extra being handled is a 400 error, or something else.
  • The None case is now just ErrorHandler::Server, for "handle all errors as server errors".

As a heads up, I know you described one of the variants as Conflict - I'm okay calling it that, but FYI, I think this variant actually translates to a 400 BAD REQUEST rather than a 409 CONFLICT.

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like where this is going!

nexus/src/db/error.rs Outdated Show resolved Hide resolved
nexus/src/db/error.rs Outdated Show resolved Hide resolved
@davepacheco
Copy link
Collaborator

I just went through datastore.rs too and that looks good (modulo the other comments). I'll be interested to hear what you think of changing the enum from OpKind to something describing the errors expected. That feels quite a lot clearer to me. The caller isn't trying to tell this function what it's doing -- they're telling the function what specific errors to explicitly go look for in the returned database error and providing the extra information needed to translate those errors into the public error.

@david-crespo
Copy link
Contributor

david-crespo commented Jan 27, 2022

I like the changes. Looks kind of like what I was trying to do in #357, where I tried to add a layer of abstraction representing datastore errors before they become http errors, on the idea that a given datastore error might need to map to different HTTP errors based on the calling context. Effectively the arguments to public_error_from_diesel_pool represent that layer here.

@smklein
Copy link
Collaborator Author

smklein commented Jan 28, 2022

I just went through datastore.rs too and that looks good (modulo the other comments). I'll be interested to hear what you think of changing the enum from OpKind to something describing the errors expected. That feels quite a lot clearer to me. The caller isn't trying to tell this function what it's doing -- they're telling the function what specific errors to explicitly go look for in the returned database error and providing the extra information needed to translate those errors into the public error.

Updated to use this pattern. I like re-framing it as "error handlers", because that's exactly the additional context being provided. "Please handle these cases for me, and here's how".

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it! This is a great improvement.

nexus/src/db/error.rs Outdated Show resolved Hide resolved
@smklein smklein enabled auto-merge (squash) January 28, 2022 23:34
@smklein smklein merged commit c4e76cb into main Jan 29, 2022
@smklein smklein deleted the diesel-errors branch January 29, 2022 00:04
leftwo pushed a commit that referenced this pull request Mar 4, 2024
Crucible changes:
Per client, queue-based backpressure (#1186)
A builder for the Downstairs Downstairs struct. (#1152)
Update Rust to v1.76.0 (#1153)
Deactivate the read only parent after a scrub (#1180)
Start byte-based backpressure earlier (#1179)
Tweak CI scripts to fix warnings (#1178)
Make `gw_ds_complete` less public (#1175)
Verify extent under repair is valid after copying files (#1159)
Remove individual panic setup, use global panic settings (#1174)
[smf] Use new zone network config service (#1096)
Move a few methods into downstairs (#1160)
Remove extra clone in upstairs read (#1163)
Make `crucible-downstairs` not depend on upstairs (#1165)
Update Rust crate rusqlite to 0.31 (#1171)
Update Rust crate reedline to 0.29.0 (#1170)
Update Rust crate clap to 4.5 (#1169)
Update Rust crate indicatif to 0.17.8 (#1168)
Update progenitor to bc0bb4b (#1164)
Do not 500 on snapshot delete for deleted region (#1162)
Drop jobs from Offline downstairs. (#1157)
`Mutex<Work>` → `Work` (#1156)
Added a contributing.md (#1158)
Remove ExtentFlushClose::source_downstairs (#1154)
Remove unnecessary mutexes from Downstairs (#1132)

Propolis changes:
PHD: improve Windows reliability (#651)
Update progenitor and omicron deps
Clean up VMM resource on server shutdown
Remove Inventory mechanism
Update vergen dependency
Properly handle pre/post illumos#16183 fixups
PHD: add `pfexec` to xtask phd-runner invocation (#647)
PHD: add Windows Server 2016 adapter & improve WS2016/2019 reliability (#646)
PHD: use `clap` for more `cargo xtask phd` args (#645)
PHD: several `cargo xtask phd` CLI fixes (#642)
PHD: Use ZFS clones for file-backed disks (#640)
PHD: improve ctrl-c handling (#634)
leftwo added a commit that referenced this pull request Mar 4, 2024
Crucible changes:
Per client, queue-based backpressure (#1186)
A builder for the Downstairs Downstairs struct. (#1152) Update Rust to
v1.76.0 (#1153)
Deactivate the read only parent after a scrub (#1180) Start byte-based
backpressure earlier (#1179)
Tweak CI scripts to fix warnings (#1178)
Make `gw_ds_complete` less public (#1175)
Verify extent under repair is valid after copying files (#1159) Remove
individual panic setup, use global panic settings (#1174) [smf] Use new
zone network config service (#1096)
Move a few methods into downstairs (#1160)
Remove extra clone in upstairs read (#1163)
Make `crucible-downstairs` not depend on upstairs (#1165) Update Rust
crate rusqlite to 0.31 (#1171)
Update Rust crate reedline to 0.29.0 (#1170)
Update Rust crate clap to 4.5 (#1169)
Update Rust crate indicatif to 0.17.8 (#1168)
Update progenitor to bc0bb4b (#1164)
Do not 500 on snapshot delete for deleted region (#1162) Drop jobs from
Offline downstairs. (#1157)
`Mutex<Work>` → `Work` (#1156)
Added a contributing.md (#1158)
Remove ExtentFlushClose::source_downstairs (#1154) Remove unnecessary
mutexes from Downstairs (#1132)

Propolis changes:
PHD: improve Windows reliability (#651)
Update progenitor and omicron deps
Clean up VMM resource on server shutdown
Remove Inventory mechanism
Update vergen dependency
Properly handle pre/post illumos#16183 fixups
PHD: add `pfexec` to xtask phd-runner invocation (#647) PHD: add Windows
Server 2016 adapter & improve WS2016/2019 reliability (#646) PHD: use
`clap` for more `cargo xtask phd` args (#645) PHD: several `cargo xtask
phd` CLI fixes (#642)
PHD: Use ZFS clones for file-backed disks (#640)
PHD: improve ctrl-c handling (#634)

Co-authored-by: Alan Hanson <alan@oxide.computer>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants