Description
I was looking through the code around an error message I was receiving ("Server response does not contian SSH protocol identification"), and noticed that you parse the protocol identification out of the server's initial banner, using a Regex which looks to be based on section 4.2 of RFC4253. However, I think that section has been slightly misinterpreted.
Specifically, the RFC says:
4.2. Protocol Version Exchange
When the connection has been established, both sides MUST send an
identification string. This identification string MUST beSSH-protoversion-softwareversion SP comments CR LF
Since the protocol being defined in this set of documents is version
2.0, the 'protoversion' MUST be "2.0". The 'comments' string is
OPTIONAL. If the 'comments' string is included, a 'space' character
(denoted above as SP, ASCII 32) MUST separate the 'softwareversion'
and 'comments' strings. The identification MUST be terminated by a
single Carriage Return (CR) and a single Line Feed (LF) character
(ASCII 13 and 10, respectively).
...and the Regex at
SSH.NET/src/Renci.SshNet/Session.cs
Line 86 in 5942469
^SSH-(?<protoversion>[^-]+)-(?<softwareversion>.+)( SP.+)?$
The Regex expression looks for the literal characters SP
(i.e. spaceSP) to separate the software version from the subsequent comments, while the RFC says that the SP
in its syntax is actually just the literal space character (in the same way that CR LF
is the literal carriage-return and newline, so the expression should probably read:
^SSH-(?<protoversion>[^-]+)-(?<softwareversion>.+)( .+)?$
I don't think it technically affects anyone, since I don't think you do anything with the third group of that Regex, anyway, but on the off-chance that someone does eventually intend to parse comments out of the banner info, you may want to clean it up (and might want to review if you are interpreting that SP
syntax literally, anywhere else in your codebase).
In fact, given you don't actually use the third capture group in that Regex (and just need it to demarcate the end of the software version), you should be able to forego capturing it entirely (i.e. using (?:...)
instead of (...)
).
You probably do want to extract software version non-greedily, though (.+?
instead of .+
). As it stands right now, a banner returning SSH-2.0-alpha some string info here
would parse with software version alpha some string info
, with a comment of here
, where it probably should parse as software version alpha
and comment of some string info here
.
As such, I might be inclined to go with:
^SSH-(?<protoversion>[^-]+)-(?<softwareversion>.+?)(?: .+)?$
As I say, I don't think it is the cause of my "Server response does not contian SSH protocol identification" issue -- that's more likely to be a disconnection from the server -- but I thought I'd flag it because I was in the neighbourhood and, you know, duty!