Skip to content

UTF-8 Invalid sequences for component model strings #224

Closed
@rajsite

Description

@rajsite

Forgive me if I'm missing it, but is there a discussion of how the Unicode UTR 36: UTF-8 Exploits are addressed by the component model strings?

From what I can tell looking at the CanonicalABI it looks like the string lift operation is responsible for validation and trapping on "Unicode Errors".

I'm wondering what guarantees I have as a component author that lowered strings are valid UTF-8 strings from the security perspective of that report. For example, overlong string encodings in the cited UTR 36 document and in the UTF-8 Wikipedia: Invalid Sequences and Error Handling topic are specifically described as being the cause of security issues in web services (a relevant use-case for WASM components) and potentially overlooked by decoders (WASM component authors).

Some concrete questions:

  1. Are there guarantees that can be documented for component authors about strings lowered into a component and errors raised for improperly formatting sequences for strings lifted?

  2. Are there compliance tests for tooling / host runtimes around expected UTF-8 validation (particularly as the security issues there are relevant to server applications of WASM components).

  3. The canon_lower topic has a discussion point on efficient trampolines:

    Since any cross-component call necessarily transits through a statically-known canon_lower+canon_lift call pair, an AOT compiler can fuse canon_lift and canon_lower into a single, efficient trampoline.

    Is there a discussion of the validation expectations of such efficient trampoline optimizations? I'd assume you would still need to run the validation passes associated with a lift on a UTF-8 string to prevent issues like overlong encoding being overlooked.

My goal in the end is to make sure I'm not doing that work twice. If there are strong guarantees clearly described about what validation is done on strings I can skip doing that work or conversely make sure it is done.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions