Skip to content

ShellStream.Expect does not remove all matched data when multi-byte character encoding is used #383

Closed
@JonathanFortier-eaton

Description

@JonathanFortier-eaton

When we receive multi-bytes characters from a SSH server (for example, 'é' encoded in UTF-8 giving bytes [0xC3, 0xA9]), the ShellStream.Expect methods remove less bytes from the _incoming byte queue than what was really matched, so that the next read will return some of the data that should have been removed by Expect.

The bug seems to be when the functions call "_incoming.Dequeue()": they assume that the number of bytes to remove is the same as the number of character matched, which is incorrect for multi-byte encoding.

Simple fix could be to calculate the number of byte used by the matched string (_encoding.GetByteCount(...)) like what ShellStream.ReadLine does. However, this can still return a value different of what was actually read (see https://stackoverflow.com/q/9740553).

A more complete fix could be to use "_encoding.GetDecoder().Convert(...)" (see https://msdn.microsoft.com/en-us/library/h6w985hz(v=vs.110).aspx) to calculate the number of bytes that correspond to the expected number of characters.

What do you think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions