Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a few issues with ShellStream #1322

Merged
merged 5 commits into from
Feb 16, 2024

Conversation

Rob-Hague
Copy link
Collaborator

@Rob-Hague Rob-Hague commented Feb 13, 2024

I have not yet rebased this on the expectSize changes merged today, pending further discussion. So there are merge conflicts from the start. Just wanted to get it out there.


The main change is to replace the Queue with a byte[] and a couple of variables which index the start and end of the data. The remainder is mainly slightly more careful locking semantics.

One possibly contentious point: in fixing the Write behaviour (#301) I chose to remove the outgoing buffer and immediately send the data across the channel. Write(string) and WriteLine(string) were already doing this, and I felt it was better to change Write(byte[]) to match rather than changing the string methods. This is the reason for the number of deleted test classes for Write.

It also makes the definition of infinite timeout to consistently be -1 milliseconds instead of 0, as is standard in other .NET methods with a wait period. This was not documented previously.

The rest of the changes are hopefully fairly agreeable. Happy to clarify anything.

Here are the (allocation) numbers on 3bfac50, which this is based on:

Method Mean Error StdDev Gen0 Gen1 Allocated
ShellStreamReadLine 865.1 ms 7.89 ms 6.99 ms - - 507.98 KB
ShellStreamExpect 864.8 ms 5.95 ms 5.57 ms 6000.0000 6000.0000 12565.52 KB

And here is after:

Method Mean Error StdDev Allocated
ShellStreamReadLine 866.6 ms 9.37 ms 8.77 ms 389.62 KB
ShellStreamExpect 851.1 ms 8.27 ms 7.74 ms 408.65 KB

(for reference, current develop d07827b)

Method Mean Error StdDev Gen0 Gen1 Allocated
ShellStreamReadLine 866.3 ms 10.52 ms 9.32 ms - - 510.59 KB
ShellStreamExpect 864.7 ms 6.10 ms 5.70 ms 1000.0000 1000.0000 3321.11 KB

closes #63
closes #453
closes #672
closes #917
closes #180
closes #301
closes #365 (the previous implementation would read for the full timeout on each update, rather than just how long we have left)
closes #714
closes #748
closes #302
closes #1320
closes #923
closes #1024

The main change is to replace the Queue<byte> with a byte[] and a couple of
variables which index the start and end of the data. The remainder is mainly
slightly more careful locking semantics.

It also implements Expect(string) separately so that it can work on the bytes
and skip a lot of encoding work (this is where I wish ShellStream derived from
StreamReader).

One possibly contentious point: in fixing the Write behaviour I chose to
remove the "outgoing" buffer and immediately send the data across the channel.
Write(string) and WriteLine(string) were already doing this, and I felt it
was better to change Write(byte[]) to match rather than changing the string
methods.

var match = regex.Match(result);
var bufferText = _encoding.GetString(_buffer, _head, _tail - _head);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To integrate the expectSize, it would probably just be a case of restricting the search space here, e.g.

var expectHead = Math.Max(_head, _tail - _expectSize);
var bufferText = _encoding.GetString(_buffer, expectHead, _tail - expectHead);

The question is whether to put it as a parameter on the method or the constructor (I would probably prefer method)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer a method parameter with the default value it will work for everyone without any breaking change

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with the parameter and named it windowSize, which felt to me a little more intuitive in meaning.

@jscarle, regretfully these changes (as they stand) reduce most of your recent contributions to be only in spirit rather than in code. I hope you consider the end result to be a net positive nonetheless. These recent changes wouldn't have happened without your motivating persistence.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also on this line: unfortunately I think we still need to materialise the entire buffer as a string for Expect(Regex) in order to correctly deal with multi-byte encoding. (Expect(string) does not suffer from this because it works directly on the bytes). We can at least still use a reduced window for the regex match itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end, what's important is that collaboratively we'll have managed to drastically improve Expect and fix bugs with ShellStream. These are issues that date back years now and made the previous implementation difficult to use, and in some cases impossible to use, in long running scenarios.

As for windowSize, I'd recommend against using the word window in this context. As we've been putting our hands in the plumbing, it may be obvious to us, however to someone consuming the API, it may not. I feel that it is too confusing and may lead people to misinterpret that as having some sort of relationship with the terminal window size.

In terms of performance, I'm glad that although my contribution has literally been obliterated by this PR, at least I was on par with your version in terms of execution time. My allocation improved, but your version is obviously much more efficient.

It would be a good idea to performance test Expect with a very large incoming buffer (in the several MB range) to ensure that there's no regression of the original performance issues that brought me to run a parallel queue to begin with.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, windowSize is probably not a great name here. And I'd already been a bit confused by expectSize. I'll change it later. So far my best attempt is rollingWindowSize

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use one of these:

  • match
  • search
  • inspect / inspection
  • observation
  • peek
  • frame
  • analysis
  • lookahead
  • lookbehind

And/or a combination of above with Max, Size, Length, Range, or Bytes. ie: matchLength, peekSize.

Personally, I like matchLength, searchSize, inspect, and inspectSize. Though my most preferred is simply inspect. It's short, consice, and unambiguous.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions. I went with lookback

@WojciechNagorski
Copy link
Collaborator

It looks great! I'm patiently waiting for the merge with the develop branch

@WojciechNagorski
Copy link
Collaborator

Is it ready for review?

@Rob-Hague
Copy link
Collaborator Author

Yes, thanks

@WojciechNagorski
Copy link
Collaborator

Changing the behavior of the Write method is controversial, but we can try.

@Rob-Hague
Copy link
Collaborator Author

Rob-Hague commented Feb 16, 2024

Yeah, I think either WriteLine and Write(string) needed to be buffered (which would require users to add a call to Flush), or Write(byte[]) needed to not buffer. I think changing Write(byte[]) to not buffer is the safest thing there.

edit: it could be flagged in the release notes as a behavioural breaking change

@jscarle
Copy link
Contributor

jscarle commented Feb 16, 2024

I propose to do it the other way around. WriteLine could call Flush internally. And Write could buffer.

@jscarle
Copy link
Contributor

jscarle commented Feb 16, 2024

Basically, my reasoning is that if I'm consuming ShellStream in my application, when I call WriteLine, I'm done. That's what I wanted to send. While using Write, maybe I have several different Writes to do before I'm done.

@Rob-Hague
Copy link
Collaborator Author

Yep that makes sense, but note that Write(string) will also have to flush internally to meet the same behaviour as before. It's only Write(byte[]) that wouldn't flush.

@jscarle
Copy link
Contributor

jscarle commented Feb 16, 2024

Yep that makes sense, but note that Write(string) will also have to flush internally to meet the same behaviour as before. It's only Write(byte[]) that wouldn't flush.

I think that continues to make sense. If you're sending a full string, you probably are done as well. I suspect that people who are calling Write repeatedly will do so with byte[] and not strings.

@WojciechNagorski WojciechNagorski merged commit 06af2ec into sshnet:develop Feb 16, 2024
1 check passed
@Rob-Hague
Copy link
Collaborator Author

Thanks. I will restore the write buffer shortly

@WojciechNagorski
Copy link
Collaborator

Okej so then we will release the new version

@jscarle
Copy link
Contributor

jscarle commented Feb 16, 2024

@WojciechNagorski I'd like to submit my PR before you release if you don't mind.

@WojciechNagorski
Copy link
Collaborator

@jscarle Ok, no problem, just let me know which one.

@jscarle
Copy link
Contributor

jscarle commented Feb 16, 2024

@jscarle Ok, no problem, just let me know which one.

#1321

@Rob-Hague Rob-Hague deleted the shellstream branch February 17, 2024 09:00
@WojciechNagorski WojciechNagorski added this to the 2024.0.0 milestone Feb 22, 2024
Rob-Hague added a commit to Rob-Hague/SSH.NET that referenced this pull request May 15, 2024
Apply similar treatment to PipeStream as sshnet#1322 did to ShellStream

PipeStream will now behave much more Stream-like. In particular, it performs
partial reads (instead of blocking until a certain amount of data is available),
blocks until data is available (instead of returning 0 prematurely) and removes
the Stream-unlike properties `BlockLastReadBuffer` and `MaxBufferLength`.

It should also be a fair bit faster, but sadly I gave up trying to make a benchmark
compatible with all the quirks of the previous implementation.
Rob-Hague added a commit to Rob-Hague/SSH.NET that referenced this pull request May 15, 2024
Apply similar treatment to PipeStream as sshnet#1322 did to ShellStream

PipeStream now behaves much more Stream-like. In particular, it performs partial
reads (instead of blocking until a certain amount of data is available), blocks
until data is available (instead of returning 0 prematurely) and removes the
Stream-unlike properties `BlockLastReadBuffer` and `MaxBufferLength`.

It should also be a fair bit faster, but sadly I gave up trying to make a benchmark
compatible with all the quirks of the previous implementation.

Some cleanup of its usage in the library followed.
Rob-Hague added a commit to Rob-Hague/SSH.NET that referenced this pull request May 16, 2024
Apply similar treatment to PipeStream as sshnet#1322 did to ShellStream

PipeStream now behaves much more Stream-like. In particular, it performs partial
reads (instead of blocking until a certain amount of data is available), blocks
until data is available (instead of returning 0 prematurely) and removes the
Stream-unlike properties `BlockLastReadBuffer` and `MaxBufferLength`.

It should also be a fair bit faster, but sadly I gave up trying to make a benchmark
compatible with all the quirks of the previous implementation.

Some cleanup of its usage in the library followed.
Rob-Hague added a commit to Rob-Hague/SSH.NET that referenced this pull request May 16, 2024
Apply similar treatment to PipeStream as sshnet#1322 did to ShellStream

PipeStream now behaves much more Stream-like. In particular, it performs partial
reads (instead of blocking until a certain amount of data is available), blocks
until data is available (instead of returning 0 prematurely) and removes the
Stream-unlike properties `BlockLastReadBuffer` and `MaxBufferLength`.

It should also be a fair bit faster, but sadly I gave up trying to make a benchmark
compatible with all the quirks of the previous implementation.

Some cleanup of its usage in the library followed.
Rob-Hague added a commit to Rob-Hague/SSH.NET that referenced this pull request May 16, 2024
Apply similar treatment to PipeStream as sshnet#1322 did to ShellStream

PipeStream now behaves much more Stream-like. In particular, it performs partial
reads (instead of blocking until a certain amount of data is available), blocks
until data is available (instead of returning 0 prematurely) and removes the
Stream-unlike properties `BlockLastReadBuffer` and `MaxBufferLength`.

Sadly I gave up trying to make a benchmark compatible with all the quirks of the
previous implementation, but a dumb throughput test (reading and writing simultaneously)
shows about 5.2GB/s with this implementation compared to 140MB/s previously.

Some cleanup of its usage in the library followed.
WojciechNagorski added a commit that referenced this pull request May 23, 2024
Apply similar treatment to PipeStream as #1322 did to ShellStream

PipeStream now behaves much more Stream-like. In particular, it performs partial
reads (instead of blocking until a certain amount of data is available), blocks
until data is available (instead of returning 0 prematurely) and removes the
Stream-unlike properties `BlockLastReadBuffer` and `MaxBufferLength`.

Sadly I gave up trying to make a benchmark compatible with all the quirks of the
previous implementation, but a dumb throughput test (reading and writing simultaneously)
shows about 5.2GB/s with this implementation compared to 140MB/s previously.

Some cleanup of its usage in the library followed.

Co-authored-by: Wojciech Nagórski <wojtpl2@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment