[Access] Improve api error handling #3988

peterargue · 2023-03-03T01:49:30Z

Closes: #3987

github-actions · 2023-03-03T01:53:43Z

FVM Benchstat comparison

This branch with compared with the base branch onflow:master commit 03af504

The command (for i in {1..7}; do go test ./fvm ./engine/execution/computation --bench . --tags relic -shuffle=on --benchmem --run ^$; done) was used.

Collapsed results for better readability

codecov-commenter · 2023-03-03T01:56:48Z

Codecov Report

Merging #3988 (891ee55) into master (031ed1a) will decrease coverage by 6.20%.
The diff coverage is 33.09%.

@@            Coverage Diff             @@
##           master    #3988      +/-   ##
==========================================
- Coverage   59.47%   53.28%   -6.20%     
==========================================
  Files         228      825     +597     
  Lines       21499    77632   +56133     
==========================================
+ Hits        12787    41366   +28579     
- Misses       7752    32941   +25189     
- Partials      960     3325    +2365

Flag	Coverage Δ
unittests	`53.28% <33.09%> (-6.20%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
engine/access/rpc/backend/backend.go	`75.47% <0.00%> (ø)`
engine/access/rpc/backend/backend_block_headers.go	`24.61% <0.00%> (-6.76%)`	⬇️
engine/access/rpc/backend/backend_network.go	`42.02% <0.00%> (-2.59%)`	⬇️
engine/access/rpc/backend/errors.go	`0.00% <0.00%> (ø)`
engine/collection/rpc/engine.go	`11.68% <0.00%> (-0.48%)`	⬇️
engine/execution/rpc/engine.go	`51.08% <0.00%> (-0.42%)`	⬇️
engine/access/rpc/backend/backend_block_details.go	`21.42% <4.16%> (-5.36%)`	⬇️
engine/access/rpc/backend/backend_accounts.go	`54.83% <20.00%> (+3.83%)`	⬆️
engine/access/rpc/backend/backend_transactions.go	`47.46% <24.24%> (-2.45%)`	⬇️
engine/access/rpc/backend/backend_events.go	`68.45% <70.58%> (+1.78%)`	⬆️
... and 2 more

... and 617 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

engine/access/rpc/backend/backend_accounts.go

jordanschalm · 2023-03-08T17:39:13Z

engine/access/rpc/backend/backend_block_details.go

-		err = rpc.ConvertStorageError(err)
-		return nil, flow.BlockStatusUnknown, err
+		// node should always have the latest block
+		return nil, flow.BlockStatusUnknown, status.Errorf(codes.Internal, "could not get latest block: %v", err)


This indicates a pretty serious inconsistency in our local database - should we irrecoverable.Throw instead?

I'm reluctant to cause the node to crash as a result of an API request. This is generally something to avoid due to the DoS potential. In this case, it's unclear if any API response is valid if the blocks index in inconsistent, so I think we should at least halt the APIs.

The rpc engine currently isn't a component, so there's some additional work to do before we could throw. I'll open an issue to convert it and add some halting logic.

In practice, there are many other components on ANs that also check and throw given this condition, so the node won't silently fail.

In practice, there are many other components on ANs that also check and throw given this condition, so the node won't silently fail.

This is a fair point, we could add a comment along the lines to document the reasoning:

// In the RPC engine, if we encounter an error from the protocol state indicating state corruption, // we should halt processing requests, but do throw an exception which might cause a crash: // - It is unsafe to process requests if we have an internally bad state. TODO issue // - We would like to avoid throwing an exception as a result of an Access API request by policy // because this can cause DOS potential // - Since the protocol state is widely shared, we assume that in practice another component will // observe the protocol state error and throw an exception.

engine/access/rpc/backend/backend_block_details.go

engine/access/rpc/backend/backend_transactions.go

jordanschalm · 2023-03-08T18:54:37Z

engine/common/rpc/errors.go

+// ConvertError converts a generic error into a grpc status error
+func ConvertError(err error, msg string, defaultCode codes.Code) error {


I think this kind of centralized error handling has the tendency to be misused when it addresses anything other than errors which are universally benign.

For example, a context.Canceled error can pretty much always be considered a benign network failure. However, ErrNotFound could be either benign or critical, depending on the context.

If someone requests a block height which we have not seen yet, then ErrNotFound is benign and completely expected. We should return a status code and do nothing else.

If someone requests the latest finalized block, then ErrNotFound indicates a very serious problem with our internal state.

By moving the interpretation of errors from individual handlers into a central handling function, it becomes more likely that context-dependent errors like ErrNotFound will be interpreted as benign in cases where they should be raising alarm bells.

I noticed that in several places you used status.Errorf rather than this function, when a ErrNotFound indicated something seriously wrong. That's good, but I think the structure of the centralized error handling function makes it easy for future changes to skip this step and simply call ConvertError to handle any error without additional inspection.

I think the most correct approach would be to remove any error types which can't universally be considered benign in the context of the RPC Backend from the switch statement (ie. ErrNotFound) - see suggestion below. Minimally, we should clearly document the responsibility of the caller to handle context-dependent error types.

Suggestion

Restrict the use of ConvertError to error types which can universally be considered benign.

Add a note to the godoc about usage requirements

Suggested change

// ConvertError converts a generic error into a grpc status error

func ConvertError(err error, msg string, defaultCode codes.Code) error {

// ConvertError converts a generic networking error into a grpc status error. The input

// must either be a status.Error already, or be a supported error type which can be

// universally interpreted as benign (see switch statement). The caller must first check

// for any other unsupported errors and handle them accordingly.

// Error returns:

// - status.Error for any status.Error or supported networking error inputs

// - generic error in case an unsupported error type is passed in

func ConvertError(err error, msg string, defaultCode codes.Code) error {

This is a really good point about storage errors, and I'll make that change. Not all storage errors should return NotFound, and codifying that will create bugs in the future.

The intention with this method is to simplify converting common error conditions into their corresponding grpc status codes immediately before returning them back to the client. Ultimately, this helps with monitoring and alerting for nodes since Internal and Unknown errors should only occur in genuine failure cases. It's not intended to make any assertions about the error condition. If it's not a direct mapping, the defaultCode should be returned (Internal in all cases in this PR)

engine/access/rpc/backend/backend_block_headers.go

Co-authored-by: Jordan Schalm <jordan@dapperlabs.com>

peterargue · 2023-03-10T21:58:59Z

bors merge

bors · 2023-03-10T22:21:49Z

Build succeeded:

3988: [Access] Improve api error handling r=peterargue a=peterargue Closes: #3987 Co-authored-by: Peter Argue <89119817+peterargue@users.noreply.github.com>

[Access] Improve api error handling

6febd13

peterargue requested review from vishalchangrani, ramtinms and jordanschalm as code owners March 3, 2023 01:49

peterargue added 3 commits March 2, 2023 17:56

fix error encoding

c81fe4d

Fix lint error

45c9305

add back final missing result check

65e9616

peterargue requested a review from koko1123 March 7, 2023 20:03

jordanschalm reviewed Mar 8, 2023

View reviewed changes

peterargue and others added 3 commits March 8, 2023 11:59

Apply suggestions from code review

08aa0e0

Co-authored-by: Jordan Schalm <jordan@dapperlabs.com>

Address review feedback

4ca2008

fix lint

9c436a3

peterargue requested a review from jordanschalm March 10, 2023 00:02

jordanschalm approved these changes Mar 10, 2023

View reviewed changes

fix unittest to mock ByHeight

80c220b

koko1123 approved these changes Mar 10, 2023

View reviewed changes

fix more unittests and add requested comments

891ee55

bors bot merged commit ba99bea into master Mar 10, 2023

bors bot deleted the petera/access-api-error-cleanup branch March 10, 2023 22:21

peterargue mentioned this pull request Apr 13, 2023

[Access] Internal error returns for invalid input to GetAccount #3500

Closed

peterargue added a commit that referenced this pull request May 19, 2023

Merge #3988

3ac6033

3988: [Access] Improve api error handling r=peterargue a=peterargue Closes: #3987 Co-authored-by: Peter Argue <89119817+peterargue@users.noreply.github.com>

peterargue mentioned this pull request May 19, 2023

Backport event streaming to v0.29 #4370

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Access] Improve api error handling #3988

[Access] Improve api error handling #3988

peterargue commented Mar 3, 2023 •

edited

Loading

github-actions bot commented Mar 3, 2023 •

edited

Loading

codecov-commenter commented Mar 3, 2023 •

edited

Loading

jordanschalm Mar 8, 2023

peterargue Mar 9, 2023

jordanschalm Mar 10, 2023

jordanschalm Mar 8, 2023

peterargue Mar 9, 2023 •

edited

Loading

peterargue commented Mar 10, 2023

bors bot commented Mar 10, 2023

		// ConvertError converts a generic error into a grpc status error
		func ConvertError(err error, msg string, defaultCode codes.Code) error {

-// ConvertError converts a generic error into a grpc status error
-func ConvertError(err error, msg string, defaultCode codes.Code) error {
+// ConvertError converts a generic networking error into a grpc status error. The input
+// must either be a status.Error already, or be a supported error type which can be
+// universally interpreted as benign (see switch statement). The caller must first check
+// for any other unsupported errors and handle them accordingly.
+// Error returns:
+//  - status.Error for any status.Error or supported networking error inputs
+//  - generic error in case an unsupported error type is passed in
+func ConvertError(err error, msg string, defaultCode codes.Code) error {

[Access] Improve api error handling #3988

[Access] Improve api error handling #3988

Conversation

peterargue commented Mar 3, 2023 • edited Loading

github-actions bot commented Mar 3, 2023 • edited Loading

FVM Benchstat comparison

codecov-commenter commented Mar 3, 2023 • edited Loading

Codecov Report

jordanschalm Mar 8, 2023

Choose a reason for hiding this comment

peterargue Mar 9, 2023

Choose a reason for hiding this comment

jordanschalm Mar 10, 2023

Choose a reason for hiding this comment

jordanschalm Mar 8, 2023

Choose a reason for hiding this comment

Suggestion

peterargue Mar 9, 2023 • edited Loading

Choose a reason for hiding this comment

peterargue commented Mar 10, 2023

bors bot commented Mar 10, 2023

peterargue commented Mar 3, 2023 •

edited

Loading

github-actions bot commented Mar 3, 2023 •

edited

Loading

codecov-commenter commented Mar 3, 2023 •

edited

Loading

peterargue Mar 9, 2023 •

edited

Loading