Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the SshFabricTransport plugin #6154

Merged
merged 1 commit into from
Jul 23, 2024
Merged

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Oct 17, 2023

Fixes #6082

This transport plugin subclasses the SshTransport plugin but replaces the managing of the connection with the fabric library. Under the hood this library still uses paramiko, however, it is a lot clevered in automatically determining the correct connection configuration. For example, it will automatically search for SSH configurations on the machine. This makes it easier for the user to setup the transport to connect to a particular host if they can connect to it through a direct ssh invocation. Since fabric still uses paramiko under the hood, just like the SshTransport, most of the implementation can be reused.

@mbercx
Copy link
Member

mbercx commented Nov 6, 2023

Quick note to self when testing this: also try an SSH configuration that sets the following:

    UserKnownHostsFile=/dev/null
    StrictHostKeyChecking=no

This was the case for a certain user, and I'm curious to see if fabric can properly resolve and use these settings.

@mbercx
Copy link
Member

mbercx commented Nov 14, 2023

@sphuber finally got around to testing this. Setup/configuration went buttery smooth, really nice that users can just specify the Host shortname from the .ssh/config and then only have to specify if they want to use a login shell and the safe interval. 👌

However, the test for getting the number of jobs fails for some reason:

* Getting number of jobs from scheduler... [Failed]: TimeoutError:
  Full traceback:
  Traceback (most recent call last):
    File "/Users/mbercx/.aiida_venvs/sirius/lib/python3.10/site-packages/paramiko/channel.py", line 699, in recv
      out = self.in_buffer.read(nbytes, self.timeout)
    File "/Users/mbercx/.aiida_venvs/sirius/lib/python3.10/site-packages/paramiko/buffered_pipe.py", line 164, in read
      raise PipeTimeout()
  paramiko.buffered_pipe.PipeTimeout

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/Users/mbercx/project/sirius/git/aiida-core/aiida/cmdline/commands/cmd_computer.py", line 559, in computer_test
      success, message = test(
    File "/Users/mbercx/project/sirius/git/aiida-core/aiida/cmdline/commands/cmd_computer.py", line 58, in _computer_test_get_jobs
      found_jobs = scheduler.get_jobs(as_dict=True)
    File "/Users/mbercx/project/sirius/git/aiida-core/aiida/schedulers/scheduler.py", line 361, in get_jobs
      retval, stdout, stderr = self.transport.exec_command_wait(self._get_joblist_command(jobs=jobs, user=user))
    File "/Users/mbercx/project/sirius/git/aiida-core/aiida/transports/transport.py", line 438, in exec_command_wait
      retval, stdout_bytes, stderr_bytes = self.exec_command_wait_bytes(command=command, stdin=stdin, **kwargs)
    File "/Users/mbercx/project/sirius/git/aiida-core/aiida/transports/plugins/ssh.py", line 1472, in exec_command_wait_bytes
      stdout_bytes.append(stdout.read())
    File "/Users/mbercx/.aiida_venvs/sirius/lib/python3.10/site-packages/paramiko/file.py", line 200, in read
      new_data = self._read(self._DEFAULT_BUFSIZE)
    File "/Users/mbercx/.aiida_venvs/sirius/lib/python3.10/site-packages/paramiko/channel.py", line 1361, in _read
      return self.channel.recv(size)
    File "/Users/mbercx/.aiida_venvs/sirius/lib/python3.10/site-packages/paramiko/channel.py", line 701, in recv
      raise socket.timeout()
  TimeoutError

I tried setting up the same computer with core.ssh and all the tests pass fine? Any ideas?

EDIT: Seems to work just fine on Piz Daint. Strange...

@mbercx
Copy link
Member

mbercx commented Nov 14, 2023

When trying to run on Piz Daint, I encountered the following error:

 | Traceback (most recent call last):
 |   File "/Users/mbercx/project/sirius/git/aiida-core/aiida/engine/utils.py", line 187, in exponential_backoff_retry
 |     result = await coro()
 |   File "/Users/mbercx/project/sirius/git/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
 |     return execmanager.submit_calculation(node, transport)
 |   File "/Users/mbercx/project/sirius/git/aiida-core/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
 |     result = scheduler.submit_from_script(workdir, submit_script_filename)
 |   File "/Users/mbercx/project/sirius/git/aiida-core/aiida/schedulers/scheduler.py", line 412, in submit_from_script
 |     return self._parse_submit_output(*result)
 |   File "/Users/mbercx/project/sirius/git/aiida-core/aiida/schedulers/plugins/slurm.py", line 433, in _parse_submit_output
 |     transport_string = f' for {self.transport}'
 |   File "/Users/mbercx/project/sirius/git/aiida-core/aiida/transports/plugins/ssh.py", line 594, in __str__
 |     conn_info = self._machine
 | AttributeError: 'SshFabricTransport' object has no attribute '_machine'

EDIT: After fixing this

 | Traceback (most recent call last):
 |   File "/Users/mbercx/project/sirius/git/aiida-core/aiida/engine/utils.py", line 187, in exponential_backoff_retry
 |     result = await coro()
 |   File "/Users/mbercx/project/sirius/git/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 146, in do_submit
 |     return execmanager.submit_calculation(node, transport)
 |   File "/Users/mbercx/project/sirius/git/aiida-core/aiida/engine/daemon/execmanager.py", line 382, in submit_calculation
 |     result = scheduler.submit_from_script(workdir, submit_script_filename)
 |   File "/Users/mbercx/project/sirius/git/aiida-core/aiida/schedulers/scheduler.py", line 412, in submit_from_script
 |     return self._parse_submit_output(*result)
 |   File "/Users/mbercx/project/sirius/git/aiida-core/aiida/schedulers/plugins/slurm.py", line 433, in _parse_submit_output
 |     transport_string = f' for {self.transport}'
 |   File "/Users/mbercx/project/sirius/git/aiida-core/aiida/transports/plugins/ssh.py", line 596, in __str__
 |     conn_info = f"{self._connect_args['username']}@{conn_info}"
 | AttributeError: 'SshFabricTransport' object has no attribute '_connect_args'

@sphuber sphuber force-pushed the feature/ssh-fabric branch 3 times, most recently from 15e31ca to aa31605 Compare November 17, 2023 08:29
@sphuber sphuber force-pushed the feature/ssh-fabric branch 3 times, most recently from 0b02b48 to 64106c0 Compare December 21, 2023 18:18
@mbercx mbercx added this to the Next coding day issues milestone Jan 10, 2024
@sphuber sphuber force-pushed the feature/ssh-fabric branch from 64106c0 to 55da99b Compare March 25, 2024 08:42
@khsrali
Copy link
Contributor

khsrali commented Jul 11, 2024

When trying to run on Piz Daint, I encountered the following error:

 |     transport_string = f' for {self.transport}'
 |   File "/Users/mbercx/project/sirius/git/aiida-core/aiida/transports/plugins/ssh.py", line 596, in __str__
 |     conn_info = f"{self._connect_args['username']}@{conn_info}"
 | AttributeError: 'SshFabricTransport' object has no attribute '_connect_args'

Update note:
The issues that @mbercx has listed above probably have been resolved in @sphuber updates. __str__ is being overridden.

@khsrali
Copy link
Contributor

khsrali commented Jul 11, 2024

>     File "/Users/mbercx/project/sirius/git/aiida-core/aiida/transports/plugins/ssh.py", line 1472, in exec_command_wait_bytes
>       stdout_bytes.append(stdout.read())
>     File "/Users/mbercx/.aiida_venvs/sirius/lib/python3.10/site-packages/paramiko/file.py", line 200, in read
>       new_data = self._read(self._DEFAULT_BUFSIZE)
>     File "/Users/mbercx/.aiida_venvs/sirius/lib/python3.10/site-packages/paramiko/channel.py", line 1361, in _read
>       return self.channel.recv(size)
>     File "/Users/mbercx/.aiida_venvs/sirius/lib/python3.10/site-packages/paramiko/channel.py", line 701, in recv
>       raise socket.timeout()
>   TimeoutError
> 

I tried setting up the same computer with core.ssh and all the tests pass fine? Any ideas?

EDIT: Seems to work just fine on Piz Daint. Strange...

I'm confused why this was happening,
self._client is a fabric instance, exec_command_wait_bytes() calls on _exec_command_internal() and that calls on the paramiko instance behind the scenes: channel = self.sshclient.get_transport().open_session(). Therefore, if it was working for SshTransport should also work for SshFabricTransport..
@mbercx were you able to reproduce this after the latest push? Unfortunately, I cannot see the commit history, to trace back

Copy link
Contributor

@khsrali khsrali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I went through this and tried it out, a few comments:

  • If hostname doesn't exist or not configured in ~/.ssh/config, no message is communicated to users during verdi -p <profile> computer configure. I would even add a click callback, to actually try with fabric if connection is possible.
  • If the ~/.ssh/config is deleted after configuring your computer, the connection always fails.

These two, makes core.ssh_fabric a "silent" plugin, basically works only if user can do $ ssh hostname.
We should be aware if that fails for whatever reason after configuration, aiida connectivity also fails.

This calls for adding more checks wherever relevant and raising to user.

Would be perfect to have this clearly stated at the time of configuring a computer.
Also useful to define def get_description(cls) to give these info.


def open(self):
"""Open the connection."""
if not self._connection.is_connected:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catch paramiko.ssh_exception.AuthenticationException:

  • check if hostname cannot be found in ~/.ssh/config
  • make user aware of this somehow.
  • kill immediately, no reason to retry MAX_ATTEMPTS_OPTION times in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would indeed be nice to get some more clear feedback, for example if I change my Host in the .ssh/config and then try to test the computer, I get:

verdi computer test todi --print-traceback
Report: Testing computer<todi> for user<lala@lolo>...
* Opening connection... [FAILED]: Error while trying to connect to the computer
  Full traceback:
  Traceback (most recent call last):
    File "/Users/mbercx/project/core/git/aiida-core/src/aiida/cmdline/commands/cmd_computer.py", line 541, in computer_test
      with transport:
    File "/Users/mbercx/project/core/git/aiida-core/src/aiida/transports/transport.py", line 128, in __enter__
      self.open()
    File "/Users/mbercx/project/core/git/aiida-core/src/aiida/transports/plugins/ssh_fabric.py", line 47, in open
      self._sftp = self._connection.sftp()  # This opens the fabric connection ``self._connection`` as well
    File "/Users/mbercx/.aiida_venvs/core/lib/python3.10/site-packages/decorator.py", line 232, in fun
      return caller(func, *(extras + args), **kw)
    File "/Users/mbercx/.aiida_venvs/core/lib/python3.10/site-packages/fabric/connection.py", line 22, in opens
      self.open()
    File "/Users/mbercx/.aiida_venvs/core/lib/python3.10/site-packages/fabric/connection.py", line 665, in open
      result = self.client.connect(**kwargs)
    File "/Users/mbercx/.aiida_venvs/core/lib/python3.10/site-packages/paramiko/client.py", line 349, in connect
      to_try = list(self._families_and_addresses(hostname, port))
    File "/Users/mbercx/.aiida_venvs/core/lib/python3.10/site-packages/paramiko/client.py", line 203, in _families_and_addresses
      addrinfos = socket.getaddrinfo(
    File "/opt/homebrew/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/socket.py", line 955, in getaddrinfo
      for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
  socket.gaierror: [Errno 8] nodename nor servname provided, or not known
Warning: 1 out of 0 tests failed

Same in case I change the name of ~/.ssh/config.

@mbercx
Copy link
Member

mbercx commented Jul 19, 2024

@mbercx were you able to reproduce this after the latest push? Unfortunately, I cannot see the commit history, to trace back

Just gave it another spin, and everything seems to be running smooth as butter on all machines I've tried.

@mbercx
Copy link
Member

mbercx commented Jul 19, 2024

If hostname doesn't exist or not configured in ~/.ssh/config, no message is communicated to users during verdi -p computer configure. I would even add a click callback, to actually try with fabric if connection is possible.

I like the idea of raising a warning in case the provided configuration fails, but would never dare to introduce such scope creep on a PR of @sphuber. 😇

That said, this should probably be done for all transport configure commands then. I'd open an issue for this and deal with it separately.

@sphuber
Copy link
Contributor Author

sphuber commented Jul 19, 2024

I agree that there not being any warning or error whatsoever is problematic. I am not sure I would want to add a check to verdi computer configure though. In any case, it shouldn't error out. At most it should provide a warning. It would be breaking if it started erroring out all of a sudden, because it is perfectly possible now to configure core.ssh when there currently isn't a connection available, even though later there would be. We could add it behind a flag --check-connection but if that is not the default, it wouldn't really help, as it would require a conscious action of the user, and in that case you might as well just use verdi computer test.

So really what I am thinking is that this is a problem specific with the core.ssh_fabric plugin. Need to see if there is a hook I can use to perform a test during verdi computer configure. But if not, we anyway need to add better validation during the opening of the connection if that now fails silently. Because even if the validation is done during configuration, the SSH config file can be updated at a later time and break the transport. So this validation anyway needs to go in the open method or something.

@sphuber
Copy link
Contributor Author

sphuber commented Jul 19, 2024

Seems that we won't be able to rely much on fabric warning us on the SSH config file not being as expected:

In [16]: from fabric import Config

In [17]: from fabric import Connection

In [18]: conn = Connection('nonexisting-domain')

In [19]: conn.open()
---------------------------------------------------------------------------
gaierror                                  Traceback (most recent call last)
Cell In[19], line 1
----> 1 conn.open()

File ~/.mambaforge/envs/aiida-py311/lib/python3.11/site-packages/fabric/connection.py:665, in Connection.open(self)
    659     kwargs["auth_strategy"] = auth_strategy_class(
    660         ssh_config=self.ssh_config,
    661         fabric_config=self.config,
    662         username=self.user,
    663     )
    664 # Actually connect!
--> 665 result = self.client.connect(**kwargs)
    666 self.transport = self.client.get_transport()
    667 return result

File ~/.mambaforge/envs/aiida-py311/lib/python3.11/site-packages/paramiko/client.py:349, in SSHClient.connect(self, hostname, port, username, password, pkey, key_filename, timeout, allow_agent, look_for_keys, compress, sock, gss_auth, gss_kex, gss_deleg_creds, gss_host, banner_timeout, auth_timeout, gss_trust_dns, passphrase, disabled_algorithms, transport_factory)
    347 errors = {}
    348 # Try multiple possible address families (e.g. IPv4 vs IPv6)
--> 349 to_try = list(self._families_and_addresses(hostname, port))
    350 for af, addr in to_try:
    351     try:

File ~/.mambaforge/envs/aiida-py311/lib/python3.11/site-packages/paramiko/client.py:203, in SSHClient._families_and_addresses(self, hostname, port)
    195 """
    196 Yield pairs of address families and addresses to try for connecting.
    197
   (...)
    200 :returns: Yields an iterable of ``(family, address)`` tuples
    201 """
    202 guess = True
--> 203 addrinfos = socket.getaddrinfo(
    204     hostname, port, socket.AF_UNSPEC, socket.SOCK_STREAM
    205 )
    206 for (family, socktype, proto, canonname, sockaddr) in addrinfos:
    207     if socktype == socket.SOCK_STREAM:

File ~/.mambaforge/envs/aiida-py311/lib/python3.11/socket.py:962, in getaddrinfo(host, port, family, type, proto, flags)
    959 # We override this function since we want to translate the numeric family
    960 # and socket type values to enum constants.
    961 addrlist = []
--> 962 for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    963     af, socktype, proto, canonname, sa = res
    964     addrlist.append((_intenum_converter(af, AddressFamily),
    965                      _intenum_converter(socktype, SocketKind),
    966                      proto, canonname, sa))

gaierror: [Errno -3] Temporary failure in name resolution

@mbercx
Copy link
Member

mbercx commented Jul 19, 2024

Some quick notes during our discussion:

  1. One use case I've encountered several times, which would be resolved by using fabric is when a cluster has multiple login nodes, and one of them is down. In that case using the default hostname (e.g. todi.cscs.ch) fails intermittently, and the suggestion is to set a specific login node (e.g. todi-ln002.cscs.ch). If I can just update my .ssh/config instead of having to change the computer, that's quite nice.
  2. Perhaps we should rename the plugin to ssh-config, to highlight that this transport always relies on the ~/.ssh/config?

@sphuber sphuber force-pushed the feature/ssh-fabric branch from 55da99b to f48a1f4 Compare July 19, 2024 14:35
@sphuber
Copy link
Contributor Author

sphuber commented Jul 19, 2024

@mbercx I have been reading the implementation of fabric and for how we would be using it, i.e. just instantiating fabric.Connection(hostname) it doesn't do anything amazingly clever besides just parsing the ~/.ssh/config file. On top of that, it simply uses the paramiko.SSHConfig for this. So on closer inspection, I feel like it provides very little benefit.

I have pushed a commit that adds the SshAutoTransport that simply subclasses the SshTransport and removes all the configuration options. It then automatically determines those from the SSH config. I believe the functionality should now be on par compared to using fabric and so I don't think we even need to add this dependency. Could you please have a go and test the SshAutoTransport with your clusters? I don't have any machines that can test proxy jumps or proxy commands.

@sphuber sphuber force-pushed the feature/ssh-fabric branch from 740e7ca to 22a6e69 Compare July 20, 2024 11:59
@mbercx
Copy link
Member

mbercx commented Jul 22, 2024

Thanks @sphuber! I'll give it a spin. :)

It then automatically determines those from the SSH config.

Quick first question here: Does this happen at configuration time, or connection time?

@sphuber
Copy link
Contributor Author

sphuber commented Jul 22, 2024

Quick first question here: Does this happen at configuration time, or connection time?

At connection time. It works exactly as the fabric version, just it only uses paramiko for the implementation so we don't have to unnecessarily add a dependency

@mbercx
Copy link
Member

mbercx commented Jul 22, 2024

Tested three different clusters I have access to, all seems to work as expected! 👌 Will now do a proper review.

@mbercx mbercx self-requested a review July 22, 2024 09:26
Copy link
Member

@mbercx mbercx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two comments, main one being the name of the plugin.

To keep my scope-creep reputation alive: for this plugin it would be especially nice if the user didn't have to configure the connection after setting up the computer. But we can discuss if/how to make that happen another day.

__all__ = ('SshAutoTransport',)


class SshAutoTransport(SshTransport):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in one of my comments above, I'm wondering if we should name the plugin SshConfigTransport (entry point ssh_config), since it takes the information from the ~/.ssh/config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point of it corresponding to the ~/.ssh/config file, but I don't think it is as obvious to a user. If a user knows nothing about the way these plugins work, but they have the option between core.ssh and core.ssh_auto, I think it is safe to say that they will pick the latter if they want the easy experience. In contrast, if the options are core.ssh and core.ssh_config I think they would guess that core.ssh_config would require more configuration. That is why I would vote for core.ssh_auto and just have the docstring explain that it automatically parses the config file. This also makes it less tied to exactly what it is parsing automatically if in the future we want to add more sources of information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, fair enough! Happy to keep auto then. :)

with (pathlib.Path('~').expanduser() / '.ssh' / 'config').open() as handle:
config_ssh.parse(handle)

if self._machine not in config_ssh.get_hostnames():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any case where the connection would still work when the host is not in the ~/.ssh/config? Is there a reason to not outright raise here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking: it might. And since it will raise in that case anyway, why not give it a shot?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One consideration might be that the user might focus more on the (somewhat unclear) error and miss the very clear warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. One more reason why I decided to use a warning is that the ~/.ssh/config is not the only place that SSH connections can be configured. There are also system wide configs like etc/sshd/* and others. These are also checked by fabric which is why it failed silently if the specified hostname is not part of the ~/.ssh/config file, because it cannot conclude that there is a problem if that is the case. We are adding this because for our purposes in most cases the user's config file would be the main place and we want to warn explicitly if we don't find the host there, but it is not necessarily a problem. Anyway, if others think it is better to make this an exception for now, happy to change it. We can always remove it later and turn it into a warning if there are cases where the plugin should rely on other configuration files as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also not critical for me, and I see your point. In fact, now that I think of it, I might already know of a user that had a setup I wasn't familiar with, which might work fine even though the remote isn't specified in the ~/.ssh/config. So let's keep the warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbercx did you already tried the core.ssh_auto with your various real-life test cases? Did it work?

Copy link
Member

@mbercx mbercx Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes! I tried 3 different clusters, with and without proxy jump. Also messed with the configuration to make it fail and then fixed it. All seems to work as expected.

So I think once we remove ssh_fabric we can go ahead and merge this, let it into the wild and see what happens. 😁

@sphuber sphuber force-pushed the feature/ssh-fabric branch from 22a6e69 to 8fd664f Compare July 22, 2024 10:09
Copy link

codecov bot commented Jul 22, 2024

Codecov Report

Attention: Patch coverage is 79.16667% with 5 lines in your changes missing coverage. Please review.

Project coverage is 77.82%. Comparing base (ef60b66) to head (7624505).
Report is 118 commits behind head on main.

Files with missing lines Patch % Lines
src/aiida/transports/plugins/ssh_auto.py 79.17% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6154      +/-   ##
==========================================
+ Coverage   77.51%   77.82%   +0.32%     
==========================================
  Files         560      566       +6     
  Lines       41444    41961     +517     
==========================================
+ Hits        32120    32654     +534     
+ Misses       9324     9307      -17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This transport plugin subclasses the `SshTransport` plugin in order to
remove all the configuration options. Instead, it parses the user's SSH
config file using `paramiko.SSHConfig` when the transport is
instantiated to determine the connection parameters automatically.

The advantage of this approach is that when configuring a `Computer`
using this plugin, the user is not prompted with a bunch of options.
Rather, if they can connect to the target machine using `ssh` directly,
the transport will also work. What's more, when the user updates their
SSH config, the transport automatically uses these changes the next time
it is instantiated as opposed to the `SshTransport` plugin which stores
the configuration in an `AuthInfo` in the database and is therefore
static.

The original implementation of this plugin looked into the `fabric`
library. This library builds on top of `paramiko` and aims to make
configuration SSH connections easier, just as this new plugin was aiming
to. However, after a closer look, it seems that fabric was not adding a
lot of clever code when it comes to parsing the user's SSH config. It
does implement some clever code for dealing with proxy jumps and
commands but the `SshTransport` also already implements this. Therefore,
it is not really justified to add `fabric` as a dependency but instead
we opt to use `paramiko` to parse the config ourselves.
@sphuber
Copy link
Contributor Author

sphuber commented Jul 23, 2024

Keeping the original implementation using fabric here for posterity as I will refactor the branch to remove it

###########################################################################
# Copyright (c), The AiiDA team. All rights reserved.                     #
# This file is part of the AiiDA code.                                    #
#                                                                         #
# The code is hosted on GitHub at https://github.com/aiidateam/aiida-core #
# For further information on the license, see the LICENSE.txt file        #
# For further information please visit http://www.aiida.net               #
###########################################################################
"""Transport plugin for SSH connection using ``fabric``.

This subclasses the ``SshTransport`` plugin to replace the connection configuration using the ``fabric`` package. This
makes the configuration significantly easier for the user since ``fabric`` automatically guesses connection parameters
by parsing available SSH configuration files on the system. Since ``fabric`` uses ``paramiko`` under the hood, which is
the library used by the ``SshTransport``, most of the implementation can be reused. The ``paramiko.SshClient`` is
exposed on the ``fabric.Connection`` instance and is proxied to the ``_client`` attribute of the class where the
``SshTransport`` base class expects to find it.
"""

import fabric

from aiida.common.exceptions import InvalidOperation

from ..transport import Transport
from .ssh import SshTransport

__all__ = ('SshFabricTransport',)


class SshFabricTransport(SshTransport):
    """Transport plugin for SSH connections with highly-automated connection configuration.

    The connection is made using the ``fabric`` package which will try to use available SSH configuration files on the
    system to guess the correct configuration for the connection to the relevant hostname.
    """

    _valid_auth_options = []

    def __init__(self, *args, **kwargs):
        """Construct a new instance."""
        Transport.__init__(self, *args, **kwargs)  # Skip the ``SshTransport`` constructor
        self._connection = fabric.Connection(self.hostname)
        self._client = self._connection.client
        self._sftp = None

    def open(self):
        """Open the connection."""
        if not self._connection.is_connected:
            self._sftp = self._connection.sftp()  # This opens the fabric connection ``self._connection`` as well
            self._sftp.chdir('.')  # Needed to make sure sftp.getcwd() returns a valid path
            self._is_open = True

    def close(self):
        """Close the connection.

        This will close the SFTP channel and the ``fabric`` connection.

        :raise aiida.common.InvalidOperation: If the channel is already open.
        """
        if not self._is_open:
            raise InvalidOperation('Cannot close the transport: it is already closed')

        if self._sftp:
            self._sftp.close()
        self._connection.close()
        self._is_open = False

    def __str__(self):
        """Return a string representation of the transport instance."""
        status = 'OPEN' if self._is_open else 'CLOSED'
        return f'{self._connection.user}@{self.hostname}:{self._connection.port} [{status}]'

    def gotocomputer_command(self, remotedir):
        """Return the command string to connect to the remote and change directory to ``remotedir``."""
        return f'ssh -t {self.hostname} {self._gotocomputer_string(remotedir)}'

@sphuber sphuber force-pushed the feature/ssh-fabric branch from 8fd664f to 7624505 Compare July 23, 2024 14:06
Copy link
Member

@mbercx mbercx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @sphuber! Looking forward to seeing this get released. Let's encourage everyone to start using it on main ASAP so we can catch any possible issues before that though.

@sphuber sphuber merged commit 71422eb into aiidateam:main Jul 23, 2024
29 checks passed
@sphuber sphuber deleted the feature/ssh-fabric branch July 23, 2024 19:28
mikibonacci pushed a commit to mikibonacci/aiida-core that referenced this pull request Sep 3, 2024
This transport plugin subclasses the `SshTransport` plugin in order to
remove all the configuration options. Instead, it parses the user's SSH
config file using `paramiko.SSHConfig` when the transport is
instantiated to determine the connection parameters automatically.

The advantage of this approach is that when configuring a `Computer`
using this plugin, the user is not prompted with a bunch of options.
Rather, if they can connect to the target machine using `ssh` directly,
the transport will also work. What's more, when the user updates their
SSH config, the transport automatically uses these changes the next time
it is instantiated as opposed to the `SshTransport` plugin which stores
the configuration in an `AuthInfo` in the database and is therefore
static.

The original implementation of this plugin looked into the `fabric`
library. This library builds on top of `paramiko` and aims to make
configuration SSH connections easier, just as this new plugin was aiming
to. However, after a closer look, it seems that fabric was not adding a
lot of clever code when it comes to parsing the user's SSH config. It
does implement some clever code for dealing with proxy jumps and
commands but the `SshTransport` also already implements this. Therefore,
it is not really justified to add `fabric` as a dependency but instead
we opt to use `paramiko` to parse the config ourselves.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ssh_light - a simple ssh transport plugin
3 participants