Closed
Description
I've done some work to convert from paramiko to parallel-ssh, but have hit an issue where I'm sometimes seeing very short commands stall and take several minutes to complete. In this case, we're running cat on a small fio config file. I've been able to reproduce this with a simple script, getting results like this:
Started at 2022-04-25 09:01:52.893054, ended at 2022-04-25 09:01:53.672184, total time is 0:00:00.779130
Started at 2022-04-25 09:01:54.592475, ended at 2022-04-25 09:01:55.372624, total time is 0:00:00.780149
Started at 2022-04-25 09:01:56.312288, ended at 2022-04-25 09:01:57.041410, total time is 0:00:00.729122
Started at 2022-04-25 09:01:57.896660, ended at 2022-04-25 09:04:58.563031, total time is 0:03:00.666371
I'm running this on an Ubuntu 20.04 system with the target also being an Ubuntu 20.04 system. I have not seen this issue with commands that take longer to run.
Script:
#!/usr/bin/env python3
'''
Quick script to try to reproduce stall with paralle-ssh
'''
from pssh.clients.native import SSHClient
from pssh import exceptions
from datetime import datetime
hostname = "<target host>"
cmd = "cat /tmp/red-bdev-rand-rw.fio"
stdout = ""
stderr = ""
cmd_timeout = 180.0
login = "<user>"
password = "<password>"
port_num = 22
connect_retry_count = 3
keyfile = "<keyfile>"
client = SSHClient(host=hostname, user=login, password=password, port=port_num,
num_retries=connect_retry_count, allow_agent=False, identity_auth=False, pkey=keyfile, timeout=cmd_timeout)
start = datetime.now()
host_out = client.run_command(cmd, use_pty=True, timeout=cmd_timeout)
client.wait_finished(host_output=host_out)
try:
for line in host_out.stdout:
stdout += line
for line in host_out.stderr:
stderr += line
retcode = host_out.exit_code
except exceptions.Timeout as err:
# May as well pull all available output
for line in host_out.stdout:
stdout += line
for line in host_out.stderr:
stderr += line
retcode = host_out.exit_code
raise AssertionError(f"Command {cmd} timed out on host {hostname} after {cmd_timeout} seconds. "
f"Partial output: {stdout} stderr: {stderr}") from err
except Exception as err:
raise AssertionError(f"Failed in rtfutils with error {err}") from err
finally:
client.close_channel(channel=host_out.channel)
done = datetime.now()
print(f"Started at {start}, ended at {done}, total time is {done - start}")
Contents of the red-bdev-rand-rw.fio file:
#red-bdev-rand-rw test
[global]
name=red-bdev-rand-rw
ioengine=${IOENGINE}
filename=${FILENAME}
size=${SIZE}
direct=1
group_reporting=1
thread=1
time_based=1
runtime=90
blocksize_range=4k:3m
rw=randrw
[file1]
iodepth=16
numjobs=1
[file2]
iodepth=16
numjobs=2
[file3]
iodepth=16
numjobs=4
[file4]
iodepth=8
numjobs=8
[file5]
iodepth=4
numjobs=16
[file6]
iodepth=4
numjobs=32