Skip to content

Conversation

@ginglis13
Copy link
Contributor

@ginglis13 ginglis13 commented Dec 9, 2025

Description of changes:

Wrap all AWS SDK errors in DisplayErrorContext to provide the full call chain. Some of the existing errors, such as the ssm module's GetParametersSnafu, will provide the source error as "ServiceError" with no additional details or unwrapping of the error chain. See https://docs.aws.amazon.com/sdk-for-rust/latest/dg/error-handling.html#displayErrorContext

Testing done:

make build

Runtime test:

Here's a test with the new change:

./target/release/pubsys --infra-config-path /tmp/test-infra.toml validate-ssm --expected-parameters-path /tmp/expected-params.json
19:22:29 [INFO] Parsing Infra.toml file
19:22:29 [INFO] Found infra config at path: /tmp/test-infra.toml
19:22:29 [INFO] Parsing expected parameters file
19:22:29 [INFO] Parsed expected parameters file
19:22:29 [INFO] Retrieving SSM parameters
19:22:29 [INFO] Retrieving SSM parameters in us-west-2
19:22:29 [ERROR] Failed to retrieve images in region us-west-2: Failed to fetch SSM parameters by path  in us-west-2: ValidationException: 1 validation error detected: Value '' at 'path' failed to satisfy constraint: Member must have length greater than or equal to 1
19:22:29 [INFO] Validating SSM parameters
+-----------+---------+-----------+---------+------------+-------------+
| String    | correct | incorrect | missing | unexpected | unreachable |
+-----------+---------+-----------+---------+------------+-------------+
| us-west-2 | 0       | 0         | 0       | 0          | 1           |
+-----------+---------+-----------+---------+------------+-------------+

That same test with 0.14.0 pubsys:

 ./target/release/pubsys --infra-config-path /tmp/test-infra.toml validate-ssm --expected-parameters-path /tmp/expected-params.json
19:23:38 [INFO] Parsing Infra.toml file
19:23:38 [INFO] Found infra config at path: /tmp/test-infra.toml
19:23:38 [INFO] Parsing expected parameters file
19:23:38 [INFO] Parsed expected parameters file
19:23:38 [INFO] Retrieving SSM parameters
19:23:38 [INFO] Retrieving SSM parameters in us-west-2
19:23:39 [ERROR] Failed to retrieve images in region us-west-2: Failed to fetch SSM parameters by path  in us-west-2: service error
19:23:39 [INFO] Validating SSM parameters
+-----------+---------+-----------+---------+------------+-------------+
| String    | correct | incorrect | missing | unexpected | unreachable |
+-----------+---------+-----------+---------+------------+-------------+
| us-west-2 | 0       | 0         | 0       | 0          | 1           |
+-----------+---------+-----------+---------+------------+-------------+

where I have as these test input files:

# /tmp/expected-params.json
{
  "us-west-2": {
    "/aws/service/bottlerocket/test-param": "expected-value"
  }
}

and

# /tmp/test-infra.toml
[aws]
regions = ["us-west-2"]

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@ginglis13 ginglis13 force-pushed the pubsys-aws-sdk-errors branch from d30840d to 7e29256 Compare December 9, 2025 21:36
@ginglis13
Copy link
Contributor Author

^ force push take a less verbose approach: provide either the code + message from the SDK error, fall back to the entire DisplayErrorContext() wrapped error.

@ginglis13 ginglis13 changed the title pubsys: wrap AWS SDK errors in DisplayErrorContext pubsys: include AWS SDK code and message in errors Dec 9, 2025
@ginglis13 ginglis13 requested a review from qianxjcraig December 9, 2025 21:40
Copy link
Contributor

@cbgbt cbgbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix! I wonder if there's some way for us to categorically resolve this. How can we never mess it up?

One idea I have is to set up a clippy lint forbidding us from using SdkError directly, and to instead force us to use a new type which impls Display using this method. I've used this to make clippy scan for "forbidden functions" -- you can actually see it in Twoliter's clippy config.

Mind trying that out and seeing if you can get it to work?

use aws_sdk_ssm::error::{DisplayErrorContext, ProvideErrorMetadata, SdkError};
use std::error::Error as StdError;

/// Helper function to format AWS SDK errors as "code: message" or fall back to full context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think you could give an example in this docstring? I don't know what to expect.

@ginglis13 ginglis13 force-pushed the pubsys-aws-sdk-errors branch from 7e29256 to ae38fc0 Compare December 9, 2025 22:45
@ginglis13
Copy link
Contributor Author

^ force push fixes the unit test added

@ginglis13
Copy link
Contributor Author

ginglis13 commented Dec 10, 2025

@cbgbt I have some changes prepared that do the following to implement the clippy lint approach to enforce a new AwsSdkError wrapper of the SdkError:

  1. Create a tools/error-utils crate with AwsSdkError<E> wrapper that implements Display, Debug, Error, and From<SdkError> and includes the format_aws_sdk_error helper

  2. Add clippy lint configuration to forbid direct SdkError usage

  3. Migrate all AWS SDK error handling across pubsys and testsys:

    • Replace SdkError types with AwsSdkError wrapper
    • Convert snafu .context() calls to .map_err() -> this might be the part I'm least excited about. Instead of using Snafu's .context(), we have to directly map_err to the new wrapper. If we wanted to retain the .context() syntax, we'd have to directly reference the source AWS SDK error in the snafu, which would not appease the new lint. It's a tradeoff - we lose the ergonomic .context() method for AWS SDK errors, but we gain lint compliance and the wrapper error with the consistent error message formatting.
    • Use the wrapper's Display implementation directly in display strings

It's a fair amount of increase in the scope of this PR - working to get these changes segmented into logical commits, but this is the general approach

@ginglis13 ginglis13 force-pushed the pubsys-aws-sdk-errors branch from ae38fc0 to f7b5980 Compare December 16, 2025 22:17
Add new workspace crate that provides a standardized wrapper for
AWS SDK errors to capture both error code and message from AWS
service responses. The new type wraps the SSM crate's SdkError<E>
as this itself is a type alias for SdkError<E, HttpResponse> from the
raw smithy runtime. The type is still generic enough to work with any
AWS service error type.

Signed-off-by: Gavin Inglis <giinglis@amazon.com>
Replace direct AWS SDK error handling with the new AwsSdkError wrapper
throughout for consistent error formatting and better debugging
information for AWS service failures.

Signed-off-by: Gavin Inglis <giinglis@amazon.com>
Replace direct AWS SDK error handling with the new AwsSdkError wrapper
to provides consistent error formatting and better debugging
information for AWS service failures.

Signed-off-by: Gavin Inglis <giinglis@amazon.com>
Signed-off-by: Gavin Inglis <giinglis@amazon.com>
@ginglis13 ginglis13 force-pushed the pubsys-aws-sdk-errors branch from f7b5980 to 2b6a5da Compare December 16, 2025 22:17
@ginglis13
Copy link
Contributor Author

^ force push adds 4 commits, based on comment in #613 (comment)

  1. Add a new wrapper type for AWS SDK errors with enhanced error message formatting
  2. Use this type in pubsys
  3. Use this type in testsys
  4. Add a new clippy lint for enforcing that we use the error_utils type rather than AWS SDK error types

@ginglis13 ginglis13 changed the title pubsys: include AWS SDK code and message in errors tools: include AWS SDK code and message in errors Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants