Req to deal with SLURM socket errors more patiently

### Summary

At end of issue #2693 @effigies noted that the error that @dalejn was experiencing was due to the SLURM master throwing an error when it was polled with squeue, possibly because it was busy. After some further testing, we now believe that the NIH HPC SLURM master will throw this error at least once a day even with a modest polling interval. 

We would like to request a patch such that if NiPype receives any kind of timeout error (we've seen a few different kinds) from squeue, that it politely waits and tries again.

### Actual behavior
```
RuntimeError: Command:
squeue -j 9448406
Standard output:

Standard error:
slurm_load_jobs error: Socket timed out on send/recv operation
Return code: 1
```
or
```
The batch system is not available at the moment.
```
and NiPype exits
### Requested behavior
```
squeue is busy, will try again
```
And NiPype does _not_exit

### Platform details:
```
(NiPypeUpdate) [zhoud4@felix ETPB]$ python -c "import nipype; from pprint import pprint; pprint(nipype.get_info())"
{'commit_hash': 'ec7457c23',
 'commit_source': 'installation',
 'networkx_version': '2.2',
 'nibabel_version': '2.3.1',
 'nipype_version': '1.1.3',
 'numpy_version': '1.15.3',
 'pkg_path': '/data/zhoud4/python/envs/NiPypeUpdate/lib/python3.5/site-packages/nipype',
 'scipy_version': '1.1.0',
 'sys_executable': '/data/zhoud4/python/envs/NiPypeUpdate/bin/python',
 'sys_platform': 'linux',
 'sys_version': '3.5.4 | packaged by conda-forge | (default, Aug 10 2017, '
                '01:38:41) \n'
                '[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]',
 'traits_version': '4.6.0'}
(NiPypeUpdate) [zhoud4@felix ETPB]$
(NiPypeUpdate) [zhoud4@biowulf ETPB]$ sinfo -V
slurm 17.02.9
(NiPypeUpdate) [zhoud4@biowulf ETPB]$ 
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Req to deal with SLURM socket errors more patiently #2766

Summary

Actual behavior

Requested behavior

Platform details:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Req to deal with SLURM socket errors more patiently #2766

Description

Summary

Actual behavior

Requested behavior

Platform details:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions