Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use terraform to install the masters #135

Merged
merged 2 commits into from
Mar 14, 2019

Conversation

markmc
Copy link
Contributor

@markmc markmc commented Mar 8, 2019

Install terraform and terraform-provider-ironic. Add a script that you can use to experiment with using terraform to deploy the masters after 07_deploy_masters.sh exits early.

This uses the unmerged code in openshift-metal3/terraform-provider-ironic#2


TODO list before merging:

03_ocp_repo_sync.sh Outdated Show resolved Hide resolved
@hardys
Copy link

hardys commented Mar 8, 2019

Nice I was about to do something very similar to this :)

07_deploy_masters.sh Outdated Show resolved Hide resolved
@stbenjam
Copy link
Member

stbenjam commented Mar 8, 2019

This looks great!

@derekhiggins
Copy link
Collaborator

Build SUCCESS, see build http://10.8.144.11:8080/job/dev-tools/149/

@derekhiggins
Copy link
Collaborator

Build SUCCESS, see build http://10.8.144.11:8080/job/dev-tools/156/

@markmc
Copy link
Contributor Author

markmc commented Mar 11, 2019

I rebased and squashed. Masters seem to deploy fine to me.

TODO list before merging:

Anything I missed?

@derekhiggins
Copy link
Collaborator

Build SUCCESS, see build http://10.8.144.11:8080/job/dev-tools/174/

@stbenjam
Copy link
Member

I think we could defer destroy to a different JIRA ticket, if we want to get this in sooner rather than later.

Do we want to wait for gophercloud/utils#82? We could just leave the hack to pull from the PR until it's merged.

@stbenjam
Copy link
Member

Actually deleted is easy enough to implement, it's in the terraform provider now.

@markmc
Copy link
Contributor Author

markmc commented Mar 11, 2019

Actually deleted is easy enough to implement, it's in the terraform provider now.

Awesome, I'll try it out.

Do we want to wait for gophercloud/utils#82? We could just leave the hack to pull from the PR until it's merged.

Surely we can figure out some way to make the terraform-provider-ironic build from a clean checkout without having to modify a version-controlled file? Can't we vendor your fork? I hadn't looked at go modules too closely before but ... you'd hope this wouldn't be such a crazy thing to want to do

@markmc
Copy link
Contributor Author

markmc commented Mar 11, 2019

Actually deleted is easy enough to implement, it's in the terraform provider now.

Awesome, I'll try it out.

Looks to be just about working. Got the error below first eim, then re-ran it, and it completed with no errors:

+ terraform destroy --auto-approve
ironic_node_v1.openshift-master-0: Refreshing state... (ID: 9f0877f5-e47d-4337-acdd-bf9e753f1181)
ironic_node_v1.openshift-master-2: Refreshing state... (ID: a202efa7-ed06-4581-bddb-a0d5b10419f5)
ironic_node_v1.openshift-master-1: Refreshing state... (ID: 7ec4a692-e4e4-47e1-9f6e-37295d61e15d)
ironic_node_v1.openshift-master-0: Destroying... (ID: 9f0877f5-e47d-4337-acdd-bf9e753f1181)
ironic_node_v1.openshift-master-2: Destroying... (ID: a202efa7-ed06-4581-bddb-a0d5b10419f5)
ironic_node_v1.openshift-master-1: Destroying... (ID: 7ec4a692-e4e4-47e1-9f6e-37295d61e15d)

Error: Error applying plan:

3 error(s) occurred:

* ironic_node_v1.openshift-master-0 (destroy): 1 error(s) occurred:

* ironic_node_v1.openshift-master-0: cannot delete node in state 'deleting'
* ironic_node_v1.openshift-master-1 (destroy): 1 error(s) occurred:

* ironic_node_v1.openshift-master-1: cannot delete node in state 'deleting'
* ironic_node_v1.openshift-master-2 (destroy): 1 error(s) occurred:

* ironic_node_v1.openshift-master-2: cannot delete node in state 'deleting'

@stbenjam
Copy link
Member

That should be fixed now, I read Ironic's state diagram wrong.

I'll have a look about vendoring my gohpercloud changes until the PR gets merged. https://github.com/Masterminds/glide looks promising.

@derekhiggins
Copy link
Collaborator

Build SUCCESS, see build http://10.8.144.11:8080/job/dev-tools/181/

@stbenjam
Copy link
Member

stbenjam commented Mar 12, 2019

I updated openshift-metal3/terraform-provider-ironic#2 to use glide which let me vendor my fork of gophercloud/utils. Unfortunately glide's dep management is a bit more aggressive than the go.mod stuff, it pulled in a lot more things from terraform... Once we're less reliant on forks we could maybe move back to go's native mod handling.

Do you want to try removing the hack for utils? Then we can merge openshift-metal3/terraform-provider-ironic#2

@markmc
Copy link
Contributor Author

markmc commented Mar 12, 2019

I updated metalkube/terraform-provider-ironic#2 to use glide which let me vendor my fork of gophercloud/utils. Unfortunately glide's dep management is a bit more aggressive than the go.mod stuff, it pulled in a lot more things from terraform... Once we're less reliant on forks we could maybe move back to go's native mod handling.

Yeah, it's quite a thing! Agree we can move back later

Do you want to try removing the hack for utils? Then we can merge metalkube/terraform-provider-ironic#2

Seems to be working for me. Thanks!

@stbenjam
Copy link
Member

@markmc I merged openshift-metal3/terraform-provider-ironic#2, you can remove the hack for that too - and we should be good to go!

@markmc markmc changed the title WIP Start playing with terraform-provider-ironic Use terraform to install the masters Mar 12, 2019
@markmc
Copy link
Contributor Author

markmc commented Mar 12, 2019

Ok, I think this is good to go if CI passes. I'm re-testing locally also.

@derekhiggins
Copy link
Collaborator

Build SUCCESS, see build http://10.8.144.11:8080/job/dev-tools/188/

@markmc
Copy link
Contributor Author

markmc commented Mar 12, 2019

Build SUCCESS, see build http://10.8.144.11:8080/job/dev-tools/188/

This is a result on the earlier push, where we were still using openshift-metal3/terraform-provider-ironic#2

@markmc
Copy link
Contributor Author

markmc commented Mar 12, 2019

Ok, I think this is good to go if CI passes. I'm re-testing locally also.

Looks good from my local testing

@derekhiggins
Copy link
Collaborator

Build SUCCESS, see build http://10.8.144.11:8080/job/dev-tools/189/

Copy link
Member

@stbenjam stbenjam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

mcornea added a commit to mcornea/dev-scripts that referenced this pull request Mar 12, 2019
@hardys
Copy link

hardys commented Mar 13, 2019

Looks like we need a rebase but I can pull/test when done if we think this is ready to go?

Otherwise lgtm - seems like a great step towards driving the master deployment via kni-installer :)

@markmc
Copy link
Contributor Author

markmc commented Mar 13, 2019

Go for it Steve, thanks!

@markmc
Copy link
Contributor Author

markmc commented Mar 14, 2019

Rebased but haven't tested yet

@derekhiggins
Copy link
Collaborator

Build FAILURE, see build http://10.8.144.11:8080/job/dev-tools/225/

markmc added 2 commits March 14, 2019 13:17
Install terraform and terraform-provider-ironic, and use them to
replace the Ironic interaction in 07_deploy_masters.sh.
Now that terraform-provider-ironic supports delete.
@markmc
Copy link
Contributor Author

markmc commented Mar 14, 2019

Ok, works fine for me with openshift-metalkube#165

@hardys
Copy link

hardys commented Mar 14, 2019

@stbenjam Hey did the concerns about the Ironic API load/locking get resolved, or should we hold this pending further investigation?

@stbenjam
Copy link
Member

@hardys It's not resolved. Throughout testing I've deployed with terraform maybe a dozen times and I've personally run into the sqlite3 locking issue twice. I have only seen it while I'm deploying and doing something like watch -n10 openstack baremetal node list at the same time.

@markmc Have you seen it at all?

@derekhiggins
Copy link
Collaborator

Build FAILURE, see build http://10.8.144.11:8080/job/dev-tools/228/

@markmc
Copy link
Contributor Author

markmc commented Mar 14, 2019

With this PR, Yurii is seeing:

OperationalError: (sqlite3.OperationalError) database is locked [SQL: u'SELECT nodes.created_at

even with:

$ podman exec -it  ironic sqlite3 /var/lib/ironic/ironic.db  “PRAGMA journal_mode”
wal

Dmitri suggests:

So, our last resort (?) option with sqlite is to try setting busy_timeout.

@markmc
Copy link
Contributor Author

markmc commented Mar 14, 2019

@markmc Have you seen it at all?

No. I haven't.

@hardys
Copy link

hardys commented Mar 14, 2019

Ok tested and this works for me, lets merge it and iterate on the sqlite thing if it recreates.

One thing to note is folks need to destroy with the old cleanup before consuming this patch, as (understandably) the new destroy script fails due to missing ocp/tf-master

@hardys hardys merged commit 3a7a860 into openshift-metal3:master Mar 14, 2019
@markmc markmc deleted the test-terraform branch March 14, 2019 15:10
@yprokule
Copy link
Contributor

With this PR, Yurii is seeing:

OperationalError: (sqlite3.OperationalError) database is locked [SQL: u'SELECT nodes.created_at

even with:

$ podman exec -it  ironic sqlite3 /var/lib/ironic/ironic.db  “PRAGMA journal_mode”
wal

Dmitri suggests:

So, our last resort (?) option with sqlite is to try setting busy_timeout.

Failed like:

ironic_node_v1.openshift-master-2: Still creating... (2m50s elapsed)
ironic_node_v1.openshift-master-0: Still creating... (2m50s elapsed)
ironic_node_v1.openshift-master-1: Still creating... (2m50s elapsed)
2019/03/15 10:29:45 [ERROR] root: eval: *terraform.EvalApplyPost, err: 1 error(s) occurred:

* ironic_node_v1.openshift-master-1: Internal Server Error
2019/03/15 10:29:45 [ERROR] root: eval: *terraform.EvalSequence, err: 1 error(s) occurred:

* ironic_node_v1.openshift-master-1: Internal Server Error
2019/03/15 10:29:45 [TRACE] [walkApply] Exiting eval tree: ironic_node_v1.openshift-master-1
2019/03/15 10:29:48 [ERROR] root: eval: *terraform.EvalApplyPost, err: 1 error(s) occurred:

* ironic_node_v1.openshift-master-2: Internal Server Error
2019/03/15 10:29:48 [ERROR] root: eval: *terraform.EvalSequence, err: 1 error(s) occurred:

* ironic_node_v1.openshift-master-2: Internal Server Error

2019/03/15 10:30:02 [DEBUG] plugin: waiting for all plugin processes to complete...
Error: Error applying plan:

3 error(s) occurred:

* ironic_node_v1.openshift-master-0: 1 error(s) occurred:

* ironic_node_v1.openshift-master-0: Internal Server Error
* ironic_node_v1.openshift-master-1: 1 error(s) occurred:

* ironic_node_v1.openshift-master-1: Internal Server Error
* ironic_node_v1.openshift-master-2: 1 error(s) occurred:

2019-03-15T10:30:02.972+0200 [DEBUG] plugin.terraform-provider-ironic: 2019/03/15 10:30:02 [ERR] plugin: stream copy 'stderr' error: stream closed
* ironic_node_v1.openshift-master-2: Internal Server Error

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.


2019-03-15T10:30:03.007+0200 [DEBUG] plugin.terraform-provider-ironic: 2019/03/15 10:30:03 [ERR] plugin: plugin server: accept unix /tmp/plugin005349256: use of closed network connection
2019-03-15T10:30:03.008+0200 [DEBUG] plugin: plugin process exited: path=/root/.terraform.d/plugins/terraform-provider-ironic
2019-03-15 08:29:25.649 44 ERROR wsme.api [req-747fc5e4-6050-463e-9d5d-8b7fa79a00f3 - - - - -] Server-side error: "(sqlite3.OperationalError) database is locked [SQL: u'SELECT anon_1.nodes_created_at AS anon_1_nodes_created_at, anon_1.nodes_updated_at AS anon_1_nodes_updated_at, anon_1.nodes_version AS anon_1_nodes_version, anon_1.nodes_id AS anon_1_nodes_id, anon_1.nodes_uuid AS anon_1_nodes_uuid, anon_1.nodes_instance_uuid AS anon_1_nodes_instance_uuid, anon_1.nodes_name AS anon_1_nodes_name, anon_1.nodes_chassis_id AS anon_1_nodes_chassis_id, anon_1.nodes_power_state AS anon_1_nodes_power_state, anon_1.nodes_target_power_state AS anon_1_nodes_target_power_state, anon_1.nodes_provision_state AS anon_1_nodes_provision_state, anon_1.nodes_target_provision_state AS anon_1_nodes_target_provision_state, anon_1.nodes_provision_updated_at AS anon_1_nodes_provision_updated_at, anon_1.nodes_last_error AS anon_1_nodes_last_error, anon_1.nodes_instance_info AS anon_1_nodes_instance_info, anon_1.nodes_properties AS anon_1_nodes_properties, anon_1.nodes_driver AS anon_1_nodes_driver, anon_1.nodes_driver_info AS anon_1_nodes_driver_info, anon_1.nodes_driver_internal_info AS anon_1_nodes_driver_internal_info, anon_1.nodes_clean_step AS anon_1_nodes_clean_step, anon_1.nodes_deploy_step AS anon_1_nodes_deploy_step, anon_1.nodes_resource_class AS anon_1_nodes_resource_class, anon_1.nodes_raid_config AS anon_1_nodes_raid_config, anon_1.nodes_target_raid_config AS anon_1_nodes_target_raid_config, anon_1.nodes_reservation AS anon_1_nodes_reservation, anon_1.nodes_conductor_affinity AS anon_1_nodes_conductor_affinity, anon_1.nodes_conductor_group AS anon_1_nodes_conductor_group, anon_1.nodes_maintenance AS anon_1_nodes_maintenance, anon_1.nodes_maintenance_reason AS anon_1_nodes_maintenance_reason, anon_1.nodes_fault AS anon_1_nodes_fault, anon_1.nodes_console_enabled AS anon_1_nodes_console_enabled, anon_1.nodes_inspection_finished_at AS anon_1_nodes_inspection_finished_at, anon_1.nodes_inspection_started_at AS anon_1_nodes_inspection_started_at, anon_1.nodes_extra AS anon_1_nodes_extra, anon_1.nodes_automated_clean AS anon_1_nodes_automated_clean, anon_1.nodes_protected AS anon_1_nodes_protected, anon_1.nodes_protected_reason AS anon_1_nodes_protected_reason, anon_1.nodes_owner AS anon_1_nodes_owner, anon_1.nodes_allocation_id AS anon_1_nodes_allocation_id, anon_1.nodes_description AS anon_1_nodes_description, anon_1.nodes_bios_interface AS anon_1_nodes_bios_interface, anon_1.nodes_boot_interface AS anon_1_nodes_boot_interface, anon_1.nodes_console_interface AS anon_1_nodes_console_interface, anon_1.nodes_deploy_interface AS anon_1_nodes_deploy_interface, anon_1.nodes_inspect_interface AS anon_1_nodes_inspect_interface, anon_1.nodes_management_interface AS anon_1_nodes_management_interface, anon_1.nodes_network_interface AS anon_1_nodes_network_interface, anon_1.nodes_raid_interface AS anon_1_nodes_raid_interface, anon_1.nodes_rescue_interface AS anon_1_nodes_rescue_interface, anon_1.nodes_storage_interface AS anon_1_nodes_storage_interface, anon_1.nodes_power_interface AS anon_1_nodes_power_interface, anon_1.nodes_vendor_interface AS anon_1_nodes_vendor_interface, node_traits_1.created_at AS node_traits_1_created_at, node_traits_1.updated_at AS node_traits_1_updated_at, node_traits_1.version AS node_traits_1_version, node_traits_1.node_id AS node_traits_1_node_id, node_traits_1.trait AS node_traits_1_trait, node_tags_1.created_at AS node_tags_1_created_at, node_tags_1.updated_at AS node_tags_1_updated_at, node_tags_1.version AS node_tags_1_version, node_tags_1.node_id AS node_tags_1_node_id, node_tags_1.tag AS node_tags_1_tag \nFROM (SELECT nodes.created_at AS nodes_created_at, nodes.updated_at AS nodes_updated_at, nodes.version AS nodes_version, nodes.id AS nodes_id, nodes.uuid AS nodes_uuid, nodes.instance_uuid AS nodes_instance_uuid, nodes.name AS nodes_name, nodes.chassis_id AS nodes_chassis_id, nodes.power_state AS nodes_power_state, nodes.target_power_state AS nodes_target_power_state, nodes.provision_state AS nodes_provision_state, nodes.target_provision_state AS nodes_target_provision_state, nodes.provision_updated_at AS nodes_provision_updated_at, nodes.last_error AS nodes_last_error, nodes.instance_info AS nodes_instance_info, nodes.properties AS nodes_properties, nodes.driver AS nodes_driver, nodes.driver_info AS nodes_driver_info, nodes.driver_internal_info AS nodes_driver_internal_info, nodes.clean_step AS nodes_clean_step, nodes.deploy_step AS nodes_deploy_step, nodes.resource_class AS nodes_resource_class, nodes.raid_config AS nodes_raid_config, nodes.target_raid_config AS nodes_target_raid_config, nodes.reservation AS nodes_reservation, nodes.conductor_affinity AS nodes_conductor_affinity, nodes.conductor_group AS nodes_conductor_group, nodes.maintenance AS nodes_maintenance, nodes.maintenance_reason AS nodes_maintenance_reason, nodes.fault AS nodes_fault, nodes.console_enabled AS nodes_console_enabled, nodes.inspection_finished_at AS nodes_inspection_finished_at, nodes.inspection_started_at AS nodes_inspection_started_at, nodes.extra AS nodes_extra, nodes.automated_clean AS nodes_automated_clean, nodes.protected AS nodes_protected, nodes.protected_reason AS nodes_protected_reason, nodes.owner AS nodes_owner, nodes.allocation_id AS nodes_allocation_id, nodes.description AS nodes_description, nodes.bios_interface AS nodes_bios_interface, nodes.boot_interface AS nodes_boot_interface, nodes.console_interface AS nodes_console_interface, nodes.deploy_interface AS nodes_deploy_interface, nodes.inspect_interface AS nodes_inspect_interface, nodes.management_interface AS nodes_management_interface, nodes.network_interface AS nodes_network_interface, nodes.raid_interface AS nodes_raid_interface, nodes.rescue_interface AS nodes_rescue_interface, nodes.storage_interface AS nodes_storage_interface, nodes.power_interface AS nodes_power_interface, nodes.vendor_interface AS nodes_vendor_interface \nFROM nodes ORDER BY nodes.id ASC\n LIMIT ? OFFSET ?) AS anon_1 LEFT OUTER JOIN node_traits AS node_traits_1 ON node_traits_1.node_id = anon_1.nodes_id LEFT OUTER JOIN node_tags AS node_tags_1 ON node_tags_1.node_id = anon_1.nodes_id ORDER BY anon_1.nodes_id ASC'] [parameters: (1000, 0)] (Background on this error at: http://sqlalche.me/e/e3q8)". Detail:
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/wsmeext/pecan.py", line 85, in callfunction
    result = f(self, *args, **kwargs)

  File "/usr/lib/python2.7/site-packages/ironic/api/controllers/v1/node.py", line 1872, in get_all
    **extra_args)

  File "/usr/lib/python2.7/site-packages/ironic/api/controllers/v1/node.py", line 1684, in _get_nodes_collection
    filters=filters)

  File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 313, in list
    sort_dir=sort_dir)

  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 400, in get_node_list
    sort_key, sort_dir, query)

  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 229, in _paginate_query
    return query.all()

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2925, in all
    return list(self)

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 3081, in __iter__
    return self._execute_and_instances(context)

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 3106, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 980, in execute
    return meth(self, multiparams, params)

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 273, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1099, in _execute_clauseelement
    distilled_params,

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1240, in _execute_context
    e, statement, parameters, cursor, context

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1456, in _handle_dbapi_exception
    util.raise_from_cause(newraise, exc_info)

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 296, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1236, in _execute_context
    cursor, statement, parameters, context

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 536, in do_execute
    cursor.execute(statement, parameters)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants