-
Notifications
You must be signed in to change notification settings - Fork 3
Improve Error Reporting #202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d3208f4 to
1edf0b2
Compare
2ae5338 to
cd3079c
Compare
|
Going on the PR comment alone to start with: please make separate commits for
|
40c396b to
b5f0a7d
Compare
spjmurray
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some fairly high level musings to consider.
| type: string | ||
| enum: | ||
| # Defined by OAuth 2.02 | ||
| # Common error codes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Common across what? It's in the oauth2.1 spec... why not leave as is. The other problem here is reordering things unnecessarily which makes it incredibly hard to work out if anything has been added or removed. What's the justification?
|
|
||
| // Prefixed adds the given Prefix to the underlying Cause of an error. | ||
| // It should only be called at the end of the error construction chain. | ||
| func (e *Error) Prefixed() *Error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eww :D Can we not make the errors themselves self referential, so we can maintain some semblance of structure e.g.
error: foo
error_description: bar
cause:
error: foo
error_description: bar
cause:
error: foo
error_description: barThen you can communicate error wrapping across APIs, rather than having a string and this naturally lends itself to "was the root cause blah?" and you can potentially do more graceful error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not self-referential. It just calls fmt.Errorf() to wrap the root cause with an error prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
package main
import (
"errors"
"fmt"
"io"
errorsv2 "github.com/unikorn-cloud/core/pkg/server/v2/errors"
)
func main() {
var err error
err = errorsv2.NewInternalError().
WithCause(io.EOF).
Prefixed()
fmt.Println(err) // internal: EOF
err = errorsv2.NewConflictError().
WithCausef("something went wrong with another layer: %w", err).
Prefixed()
fmt.Println(err) // conflict: something went wrong with another layer: internal: EOF
fmt.Println(errors.Is(err, io.EOF)) // true
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the point is if identity errors, then it's passed to region, then it's passed back to compute, then you are left with a pretty useless string with no real programmatic parental relationships. I'm more interested about crossing service boundaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't errors be localised within the service? If a specific error needs to be handled programmatically, the service can provide a unique error code that the caller can use to respond correspondingly.
b5f0a7d to
9ae7151
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #202 +/- ##
==========================================
- Coverage 27.76% 25.42% -2.35%
==========================================
Files 63 68 +5
Lines 4337 3808 -529
==========================================
- Hits 1204 968 -236
+ Misses 3044 2749 -295
- Partials 89 91 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are some good ideas in here, especially the idea of a backtrace, but there are some concerns, feel free to chime in too @squaremo.
API errors
These have two purposes:
- Tell the user something went wrong, and what remedial action can be taken
- This is why I like the backtraces, service A can say here's the fix, and service B can augment it with more context if necessary or just relay it verbatim back to the user
- Service B can actually use the information from the backtrace to potentially do some graceful handling as it's got some form of type.
- Tell the user they can't do anything and to contact support with a w3c trace ID.
Security
Error messages are deliberately vague. This is by design. The last thing we want is to say "you tried to access resource A but that failed, it's owned by acme.com, contact owner@acme.com". Likewise we shouldn't be relaying things like, "I had an error accessing service.cloud.com". Both of these examples are exactly what a hacker would use to probe the system for users, cloud services, resources in general and use them for granting access.
Debugging
Back to the point on w3c trace IDs. Where we need to see more detail to debug something, we have OpenTelemetry that joins everything up into a trace across services.
Next Steps
I got a little concerned when I saw errors v2. Presumably that means you are going to migrate everything over piecemeal, remove v1 and rename everything back to just plain errors again.
So what I would like to see is:
- A well thought out specification that takes into account everything I have discussed above.
- Why the current version doesn't fully implement the specification.
- What simple steps can be done to get from A to B that don't require everything to be torn up 😄
9090e0c to
5767558
Compare
Add the new errorv2 package, including the unified Error type and associated helper functions for working with structured error data.
Update all code paths to use the errorv2 package instead of the original errors package to ensure consistent structured error handling.
d19fff3 to
aebf26c
Compare
7dbae85 to
70be6a4
Compare
Perform general refactoring to improve consistency, readability, and maintainability across the codebase.
70be6a4 to
3eabecf
Compare
|
Right @alansy-nscale we're ready! So, what's in |
Description
This PR improves error reporting and introduces general code enhancements across the codebase. Error responses now include additional metadata in both the JSON body and HTTP headers, giving clients more context. The error codes have been completely redesigned, including new codes such as
token_expiredandquota_exhausted, to support automatic token refresh and clearer handling of resource limits. A helper method has also been added to extract the extended metadata into theErrorstruct, making it easier for clients to work with errors programmatically. In addition, general refactoring has been applied to improve code consistency across the codebase.