Skip to content

DLPX-66267 SSH service stops listening to external sources after reboot #146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 8, 2019

Conversation

sebroy
Copy link
Contributor

@sebroy sebroy commented Oct 8, 2019

The root-cause of this bug is that the ssh systemd service doesn't have
a dependency on network interface configuration.

By default, when using DHCP, the sshd daemon listens on the unspecified
address (0.0.0.0). When the system is configured with static IP
addresses, however, each address gets included individually as
"ListenAddress" directives in /etc/ssh/sshd_config. This results in sshd
binding to and listening on each address individually. If, at startup,
the addresses listed there are not configured, sshd will fail to bind to
them, and will not listen for connections to those addresses. When that
happens, we can see sshd output errors in the ssh service journal:

 -- Reboot --
Oct 01 22:39:37 localhost sshd[604]: Server listening on 127.0.0.1 port 22.
Oct 01 22:39:37 localhost sshd[604]: error: Bind to port 22 on 10.43.42.64 faile

The fix is to have the ssh service depend on the network.target systemd unit.

With this fix, I've confirmed that sshd succeeds to bind to all static addresses. The critical path for the ssh startup at boot now looks like this:

delphix@localhost:~$ sudo systemd-analyze critical-chain ssh.service
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.

ssh.service +51ms
└─network.target @5.309s
  └─systemd-resolved.service @4.184s +1.123s
    └─systemd-networkd.service @2.048s +2.134s
      └─network-pre.target @2.047s
        └─cloud-init-local.service @956ms +1.090s
          └─open-vm-tools.service @948ms
            └─vgauth.service @946ms
              └─systemd-tmpfiles-setup.service @918ms +23ms
                └─systemd-journal-flush.service @689ms +226ms
                  └─var-log.mount @589ms +97ms
                    └─local-fs-pre.target @580ms
                      └─systemd-tmpfiles-setup-dev.service @548ms +24ms
                        └─kmod-static-nodes.service @416ms +86ms
                          └─systemd-journald.socket @415ms
                            └─system.slice @399ms
                              └─-.slice @322ms

The root-cause of this bug is that the ssh systemd service doesn't have
a dependency on network interface configuration.

By default, when using DHCP, the sshd daemon listens on the unspecified
address (0.0.0.0). When the system is configured with static IP
addresses, however, each address gets included individually as
"ListenAddress" directives in /etc/ssh/sshd_config. This results in sshd
binding to and listening on each address individually. If, at startup,
the addresses listed there are not configured, sshd will fail to bind to
them, and will not listen for connections to those addresses. When that
happens, we can see sshd output errors in the ssh service journal:

 -- Reboot --
Oct 01 22:39:37 localhost sshd[604]: Server listening on 127.0.0.1 port 22.
Oct 01 22:39:37 localhost sshd[604]: error: Bind to port 22 on 10.43.42.64 faile

The fix is to have the ssh service depend on the network.target systemd unit.
@pzakha
Copy link
Contributor

pzakha commented Oct 8, 2019

Do you know what is the mechanism for updating the ListenAddress directives inside /etc/ssh/sshd_config? Does the networking service dynamically edit the sshd_config? That sounds like a somewhat weird thing to do.

Copy link
Contributor

@prakashsurya prakashsurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sebroy
Copy link
Contributor Author

sebroy commented Oct 8, 2019

@pzakha this is a long standing feature of the virtualization platform. When configuring a static address, you can set a property on that address that determines whether or not ssh should listen on that address. The virtualization software then selectively adds that address to sshd_config based on that property.

Copy link
Contributor

@pzakha pzakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @sebroy. This LGTM.

@sebroy
Copy link
Contributor Author

sebroy commented Oct 8, 2019

bors r+

bors bot added a commit that referenced this pull request Oct 8, 2019
146: DLPX-66267 SSH service stops listening to external sources after reboot r=sebroy a=sebroy

The root-cause of this bug is that the ssh systemd service doesn't have
a dependency on network interface configuration.

By default, when using DHCP, the sshd daemon listens on the unspecified
address (0.0.0.0). When the system is configured with static IP
addresses, however, each address gets included individually as
"ListenAddress" directives in /etc/ssh/sshd_config. This results in sshd
binding to and listening on each address individually. If, at startup,
the addresses listed there are not configured, sshd will fail to bind to
them, and will not listen for connections to those addresses. When that
happens, we can see sshd output errors in the ssh service journal:
```
 -- Reboot --
Oct 01 22:39:37 localhost sshd[604]: Server listening on 127.0.0.1 port 22.
Oct 01 22:39:37 localhost sshd[604]: error: Bind to port 22 on 10.43.42.64 faile
```
The fix is to have the ssh service depend on the network.target systemd unit.

With this fix, I've confirmed that sshd succeeds to bind to all static addresses. The critical path for the ssh startup at boot now looks like this:
```
delphix@localhost:~$ sudo systemd-analyze critical-chain ssh.service
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.

ssh.service +51ms
└─network.target @5.309s
  └─systemd-resolved.service @4.184s +1.123s
    └─systemd-networkd.service @2.048s +2.134s
      └─network-pre.target @2.047s
        └─cloud-init-local.service @956ms +1.090s
          └─open-vm-tools.service @948ms
            └─vgauth.service @946ms
              └─systemd-tmpfiles-setup.service @918ms +23ms
                └─systemd-journal-flush.service @689ms +226ms
                  └─var-log.mount @589ms +97ms
                    └─local-fs-pre.target @580ms
                      └─systemd-tmpfiles-setup-dev.service @548ms +24ms
                        └─kmod-static-nodes.service @416ms +86ms
                          └─systemd-journald.socket @415ms
                            └─system.slice @399ms
                              └─-.slice @322ms
```

Co-authored-by: Sebastien Roy <seb@delphix.com>
@bors
Copy link
Contributor

bors bot commented Oct 8, 2019

Build succeeded

  • continuous-integration/travis-ci/push

@bors bors bot merged commit 27d6603 into delphix:master Oct 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants