-
-
Notifications
You must be signed in to change notification settings - Fork 97
Description
I've got a problem using this plugin with (bits of) OpenStack "Rocky". Instances are appearing in OpenStack but being deleted immediately, over-working the OpenStack instance but not making slave nodes available to Jenkins.
Background:
We've got an OpenStack instance we call "hdc" that's mostly version "Pike" (11). Last weekend, we upgraded some bits (Heat, Horizon, Glance, Cinder & Keystone) to version "Rocky" (13) ... and Jenkins then stopped being able to provision slaves.
In particular, we upgraded Cinder from 11.0.0 to 13.0.3, and we think this is what make it go wrong.
- Our slave templates boot from a VolumeSnapshot.
- The plugin (successfully) requests a new OpenStack instance - that bit works fine.
- Unlike instances booted from an Image, those booted from a VolumeSnapshot run from a Volume, and once the plugin knows what that Volume is, the plugin sets the name & description of that Volume (so that humans can tell what Volumes are being used for which purpose, especially useful when OpenStack forgets to delete them) using Openstack.setVolumeNameAndDescription.
However, since we messed with our OpenStack instance, the setVolumeNameAndDescription call has been failing with the following exception:
Unexpected exception encountered while provisioning agent hdcjenkinsslaveprodtest-win2016-hostedslave
jenkins.plugins.openstack.compute.internal.Openstack$ActionFailed: ActionResponse{success=false, fault=Invalid input for field/attribute volume. Value: {u'description': u'For hdcjenkinsslaveprodtest-win2016-hostedslave-161881 (2b556463-8fef-4616-981b-784c54ee3b4b), from VolumeSnapshot win2016_en.us_iim1.8.x_libsupp9.0.x_db2.11.1.x_ff46.x.candidate_jso (2ca66441-a4ee-4312-8ad0-755530e7b370).', u'display_name': u'hdcjenkinsslaveprodtest-win2016-hostedslave-161881[0]', u'name': u'hdcjenkinsslaveprodtest-win2016-hostedslave-161881[0]', u'os-vol-mig-status-attr:migstat': u'none', u'display_description': u'For hdcjenkinsslaveprodtest-win2016-hostedslave-161881 (2b556463-8fef-4616-981b-784c54ee3b4b), from VolumeSnapshot win2016_en.us_iim1.8.x_libsupp9.0.x_db2.11.1.x_ff46.x.candidate_jso (2ca66441-a4ee-4312-8ad0-755530e7b370).'}. Additional properties are not allowed (u'os-vol-mig-status-attr:migstat' was unexpected), code=400}
at jenkins.plugins.openstack.compute.internal.Openstack.throwIfFailed(Openstack.java:690)
at jenkins.plugins.openstack.compute.internal.Openstack.setVolumeNameAndDescription(Openstack.java:435)
at jenkins.plugins.openstack.compute.slaveopts.BootSource$VolumeSnapshot.afterProvisioning(BootSource.java:433)
at jenkins.plugins.openstack.compute.JCloudsSlaveTemplate.provisionServer(JCloudsSlaveTemplate.java:338)
at jenkins.plugins.openstack.compute.JCloudsSlaveTemplate.provisionSlave(JCloudsSlaveTemplate.java:211)
at jenkins.plugins.openstack.compute.JCloudsCloud$NodeCallable.call(JCloudsCloud.java:332)
at jenkins.plugins.openstack.compute.JCloudsCloud$NodeCallable.call(JCloudsCloud.java:319)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
FYI hdcjenkinsslaveprodtest-win2016-hostedslave is the name of our template and win2016_en.us_iim1.8.x_libsupp9.0.x_db2.11.1.x_ff46.x.candidate_jso is the name of our VolumeSnapshot.
Looking at that error in more detail, it's implying that OpenStack thinks our request was to set:
description=For hdcjenkinsslaveprod-linux-genericslave-13577 (b93379ef-8630-4c65-9fe0-93398acdab5e), from VolumeSnapshot jenkins_slave_ccs (92a49a1f-37d2-4401-ab1a-6fd0cbdd306c).display_name=hdcjenkinsslaveprod-linux-genericslave-13577[0]name=hdcjenkinsslaveprod-linux-genericslave-13577[0]os-vol-mig-status-attr:migstat=nonedisplay_description=For hdcjenkinsslaveprod-linux-genericslave-13577 (b93379ef-8630-4c65-9fe0-93398acdab5e), from VolumeSnapshot jenkins_slave_ccs (92a49a1f-37d2-4401-ab1a-6fd0cbdd306c).
OpenStack is complaining that os-vol-mig-status-attr:migstat is an "Additional" property, implying that we probably shouldn't be setting that ... except this plugin code isn't setting it - that part of the request seems to be being erroneously added in by the openstack4j code.
i.e. The bug seems to lie outside of this plugin code ... but it is causing this plugin to fail.
We need a fix and/or workaround - we can't keep our OpenStack instance(s) on ancient versions of the OpenStack software forever.
I think that a good workaround would be to make the plugin tolerate (but log) this exception and then continue on; that'd mean that the issue is no longer a fatal error for us.
A more permanent fix would be to fix the openstack4j code so it doesn't try setting things that it wasn't asked to change ... but it looks like that's a general issue running through that code (e.g. openstack4j #932, #869, #823, #820, #682, #606, #573, #470) and so it may be non-trivial to fix this elegantly.
Note: This happened using both plugin version 2.49 (the version I was using when I first hit this issue) and version 2.50 (the version I switched to in case it fixed it) so it doesn't appear to be affected by the recent change to openstack4j version being used.