Skip to content

Conversation

@ryanuber
Copy link
Member

Sets the BatchMode=yes SSH option to prevent SSH from doing interactive prompts during git clone.

This also refactors some of the SSH env var handling to match what we were already doing prior to the ?sshkey feature in #43, and fixes a bug where we didn't set up the SSH key during submodule fetches.

@cbednarski
Copy link

cbednarski commented Jan 25, 2017

@ryanuber The man page doesn't specify the behavior around known hosts, but I imagine it's something like:

  • With batch mode an unknown host will be automatically accepted
  • With batch mode a known host will automatically reject if a signature does not match

This behavior is probably fine, but this is just a guess as to what the behavior is. My points of concern are:

  1. Will this automatically accept a host signature for new hosts or automatically reject them if they are not known?
  2. If they are automatically accepted, what is the mechanism for this in Nomad? Does this get written into a chroot? Is it only present for a single job run? Does it go away when the job is rescheduled elsewhere?
  3. If there is a conflict (i.e. host is changed) what is the error mode?
    • do we hard-fail?
    • do we have to wipe the entire host / VM?
    • do we have to restart the job so it is placed in a new chroot?

@ryanuber
Copy link
Member Author

@cbednarski everything would depend on the SSH client configuration. You could have it either fail or ignore on mismatch. You could have it /dev/null the known hosts file. I think what we would want to do is automatically accept unknown hosts, fail hard on mismatches, because we run every task in an isolated root.

One thing I didn't think about was the Terraform CLI, where an unknown host today would result in a prompt, and users might be counting on that in some places, though I'm not sure. I think we can just convert this into a bugfix PR, and remove BatchMode, and just deferring all options to the SSH client config. That way there is no risk of regression in that sense.

@cbednarski
Copy link

cbednarski commented Jan 26, 2017

I think we can just convert this into a bugfix PR, and remove BatchMode, and just deferring all options to the SSH client config.

Ah that's an interesting idea. I guess the question I have is whether we want the user or operator to make this decision. On your laptop you are both so this is moot. However on a remote build host this becomes a bit more difficult to manage. Either:

  1. The operator has to add authorized hosts to known_hosts in advance (auto-accept or deny, etc. controlled by ssh_config).
  2. The user has to specify the behavior about whether to trust an unknown host (controlled by..?).

I think 1 is obviously the secure option but it's difficult to manage this because there may be multiple hosts, or the host signature may change, and that requires manual intervention on the build machine. In my case I am a remote operator and so there is actually a third party who is running the machine but is not really responsible for configuring it (and thus does not have a direct means to edit the known_hosts file). There are obviously ways around this and we should probably not try to solve for them here, but this is on my mind.

In the case of using HTTPS the host validation happens by way of a certificate authority and a root cert that is pre-installed on the machine. Via SSH we run into the age-old problem of host validation (did you really validate that host or did you just accept the fingerprint) so the BatchMode method is probably not a secure default anyway.

So in this circumstance, it sounds like your proposal is to use option 1 to provide a secure default and punt on this code change. This makes sense to me, and it seems like the secure default with whitelisting or even auto-approval is already supported. The question I have is whether it makes sense to provide users with an escape hatch via e.g. ?nossslverify to override the host settings here. I think there are reasonable arguments against this, but I'm also a pragmatist. I think setting it in the URL makes it explicit vs. BatchMode (as you indicated) potentially hiding this behavior from the user.

I would also add that if the user does opt to disable SSL verification we probably should not poison the host by propagating that to other runs. For example if I want to automatically accept a cert for one run this should not be sticky to other runs that want to opt-out of auto-accept. I think this BatchMode change has other stateful side effects in this respect.

@ryanuber
Copy link
Member Author

I think we could explore an insecure flag, but the only weird thing is that you run into an options merge problem: Which settings win, the host's config or the client's preference? It gets harder to handle that nicely or predictably. I'd say let's just punt on this for now, fix the issues at hand by merging this PR, and go from there.

@ryanuber
Copy link
Member Author

Oops, missed your comment update. As for side effects, this is completely up to the service owner who uses go-getter. In our case, we throw away temp environments on each run, so they don't affect anything else. We do this in Nomad batch jobs (isolated by cgroups/namespaces) and during TF/Packer runs (isolated by a VM), so I don't think we would bump into any weird collisions there, and it is completely manageable from the service owner's perspective.

@cbednarski
Copy link

As for side effects, this is completely up to the service owner who uses go-getter.

Yeah, just curious what the path of least surprise is here if I pull down a terraform module, for example.

This change LGTM, though. We can dial in the behavior we want with extra config but I feel better about the default behavior now.

Copy link

@cbednarski cbednarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ryanuber ryanuber merged commit cc80f38 into master Jan 26, 2017
@ryanuber ryanuber deleted the f-ssh-batch branch January 26, 2017 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants