Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tkgi] unable to create bosh vm:CPI 'set_vm_metadata' method responded with error: CmdError{"type":"Unknown","message":"Invalid Query","ok_to_retry":false} #302

Closed
fuxiaoting opened this issue Aug 23, 2021 · 6 comments
Assignees
Labels

Comments

@fuxiaoting
Copy link

Describe the bug
Follow this guide to test "enabled nsx-t policy with No-NAT with Virtual Switch (VSS/VDS) Topology"
Deploy vc 702 with nsxt policy api 3.1.3, then deploy epmc 1.12 FC build on it.
During deploy pks cluster, bosh deployment failed.

To Reproduce
Steps to reproduce the behavior:
Prerequisites:

  1. add vcenter as compute manger on the nsx manger UI
  2. create a distribute switch(need to edit MTU as 1600) and port group (vlan is 101)
  3. add hosts to distribute switch, assign vmnic1 as manage physical adapter
  4. Configure NSX for host transport nodes on the NSX manager(select VDS)
    Steps to deploy pks on epmc 1.12 FC build.
    1.selected the port group to deployment network, details configuration as follows:
    image

CPI Error Log
Can be obtained by running bosh <env params> task <failed-task-id> --cpi

Failed step 2/4: Deploying BOSH Director,: Error while deploying BOSH director: Installation was unsuccessful. Error: Installing BOSH {"type":"step_started","id":"bosh_product.deploying","description":"Installing BOSH"} ===== 2021-08-23 08:57:40 UTC Running "/usr/local/bin/bosh --no-color --non-interactive --tty create-env /var/tempest/workspaces/default/deployments/bosh.yml" Deployment manifest: '/var/tempest/workspaces/default/deployments/bosh.yml' Deployment state: '/var/tempest/workspaces/default/deployments/bosh-state.json' Started validating Validating release 'bosh'... Finished (00:00:03) Validating release 'bosh-vsphere-cpi'... Finished (00:00:28) Validating release 'uaa'... Finished (00:00:05) Validating release 'credhub'... Finished (00:00:02) Validating release 'bosh-system-metrics-server'... Finished (00:00:04) Validating release 'os-conf'... Finished (00:00:00) Validating release 'backup-and-restore-sdk'... Finished (00:00:06) Validating release 'bpm'... Finished (00:00:02) Validating release 'system-metrics'... Finished (00:00:12) Validating cpi release... Finished (00:00:00) Validating deployment manifest... Finished (00:00:19) Validating stemcell... Finished (00:00:09) Finished validating (00:01:33) Started installing CPI Compiling package 'golang-1.16-darwin/b603604c44ff8c0cf216aa96da5702c78b165b986c296d58aee5b0db3a63275d'... Finished (00:00:00) Compiling package 'golang-1.16-linux/a2f62d570ee65d73efbad06ac60bf20349ed4ec301bb28c7ac24c25efc23a923'... Finished (00:00:00) Compiling package 'ruby-2.6.5-r0.29.0/269dc54d5306119b0e4f89be04f6c470b4876f552753815586fd1ab8ebeaa70d'... Finished (00:00:00) Compiling package 'iso9660wrap/f119680c1a13eb9a18822f5ac62c5657bd5ba4bcfb915955800e0398d3cb9ea7'... Finished (00:00:00) Compiling package 'vsphere_cpi/afd6fa244c1600d192c28e251d4ab8181f07424cc8018e691d6572b3919c6e49'... Finished (00:00:00) Installing packages... Finished (00:00:18) Rendering job templates... Finished (00:00:02) Installing job 'vsphere_cpi'... Finished (00:00:00) Finished installing CPI (00:00:20) Starting registry... Finished (00:00:00) Uploading stemcell 'bosh-vsphere-esxi-ubuntu-xenial-go_agent/621.135'... Skipped [Stemcell already uploaded] (00:00:00) Started deploying Waiting for the agent on VM 'vm-d7586757-806b-46f8-b748-28866c2ca68e'... Finished (00:00:00) Running the pre-stop scripts 'unknown/0'... Finished (00:00:00) Draining jobs on instance 'unknown/0'... Finished (00:00:01) Stopping jobs on instance 'unknown/0'... Finished (00:00:01) Running the post-stop scripts 'unknown/0'... Finished (00:00:00) Deleting VM 'vm-d7586757-806b-46f8-b748-28866c2ca68e'... Finished (00:00:15) Creating VM for instance 'bosh/0' from stemcell 'sc-8814b775-956e-4bd9-b1f0-0ee0c07daa32'... Failed (00:00:20) Failed deploying (00:00:49) Stopping registry... Finished (00:00:00) Cleaning up rendered CPI jobs... Finished (00:00:00) Deploying: Creating instance 'bosh/0': Creating VM: Creating vm with stemcell cid 'sc-8814b775-956e-4bd9-b1f0-0ee0c07daa32': CPI 'create_vm' method responded with error: CmdError{"type":"Unknown","message":"Recommendations were detected, you may be running in Manual DRS mode. Aborting.","ok_to_retry":false} Exit code 1 ===== 2021-08-23 09:00:26 UTC Finished "/usr/local/bin/bosh --no-color --non-interactive --tty create-env /var/tempest/workspaces/default/deployments/bosh.yml"; Duration: 166s; Exit Status: 1 {"type":"step_finished","id":"bosh_product.deploying","description":"Installing BOSH"} Exited with 1.

Expected behavior
bosh vm should be created successfully.

Screenshots
Attached

Release Version & Related Info (please complete the following information):

  • CPI Version
  • BOSH Director Version
  • Stemcell Name & Version 621.136
  • vCenter Version 702
  • NSX(T/V) Version (If using) 3.1.3 policy api

Additional context
vc log:
cat ./vpxd/vpxd-5.log | grep vm-85

2021-08-23T07:51:45.873Z info vpxd[16707] [Originator@6876 sub=Invt opID=4739194d-01-5cc59e1c] Started call to update entity state moId on host; [vim.HostSystem:host-38,192.168.111.52], store: (vm, 10), v: 2cd29b87-bd2f-40ef-ad82-763f01738ce7:vm-85:6s2iIoiFkhLAYyvh8MerLJq6/GgyraZMCZUpGTqBD5M=, callId: 15
2021-08-23T07:51:45.906Z info vpxd[30013] [Originator@6876 sub=VmProv opID=4739194d-01] Created VM vim.VirtualMachine:vm-85 (ds:///vmfs/volumes/61231d24-ea7bff00-7db6-02010b1e6a68/vm-d7586757-806b-46f8-b748-28866c2ca68e/vm-d7586757-806b-46f8-b748-28866c2ca68e.vmx) on host [vim.HostSystem:host-38,192.168.111.52] (192.168.111.52)
--> vm = 'vim.VirtualMachine:2cd29b87-bd2f-40ef-ad82-763f01738ce7:vm-85',
2021-08-23T07:51:45.917Z error vpxd[30013] [Originator@6876 sub=coreanchor opID=4739194d-01-01] Entity [vm-85] has a device with no connection properties
2021-08-23T07:51:48.415Z info vpxd[16407] [Originator@6876 sub=vpxLro opID=4b0fed6f] [VpxLRO] -- BEGIN task-283 -- vm-85 -- vim.VirtualMachine.reconfigure -- 529ba6d8-aa2c-8f96-6811-b7c5372dfce3(520351c5-8545-1b0e-62aa-a24dc5788fb7)
2021-08-23T07:51:48.416Z info vpxd[16407] [Originator@6876 sub=vpxLro opID=4b0fed6f-01] [VpxLRO] -- BEGIN lro-44981 -- vm-85 -- vim.VirtualMachine.reconfigure -- 529ba6d8-aa2c-8f96-6811-b7c5372dfce3(520351c5-8545-1b0e-62aa-a24dc5788fb7)
--> vm = 'vim.VirtualMachine:2cd29b87-bd2f-40ef-ad82-763f01738ce7:vm-85',
--> vm = 'vim.VirtualMachine:2cd29b87-bd2f-40ef-ad82-763f01738ce7:vm-85',
2021-08-23T08:59:55.231Z info vpxd[16435] [Originator@6876 sub=vpxLro opID=2788559d] [VpxLRO] -- BEGIN task-287 -- vm-85 -- vim.VirtualMachine.powerOff -- 52afc854-8059-ee6f-d921-01a5b61e16a8(5246a0ab-5789-8cb3-587d-42c2dda3d0bb)
2021-08-23T09:00:04.187Z info vpxd[16466] [Originator@6876 sub=vpxLro opID=649dc3e8] [VpxLRO] -- BEGIN task-290 -- vm-85 -- vim.ManagedEntity.destroy -- 52afc854-8059-ee6f-d921-01a5b61e16a8(5246a0ab-5789-8cb3-587d-42c2dda3d0bb)

2021-08-23T08:59:48.665Z info vpxd[16423] [Originator@6876 sub=vpxLro opID=wcp-vCLS-72] [VpxLRO] -- BEGIN lro-54603 -- ViewManager -- vim.view.ViewManager.createContainerView -- 52f58a92-939e-471b-453e-cc536e02d0b9(52b3c97e-8d94-d2ee-e946-468d0a170cc2)
2021-08-23T08:59:48.665Z info vpxd[16423] [Originator@6876 sub=ViewManager opID=wcp-vCLS-72] Duplicate container view for Object : group-d1 [
--> user - VSPHERE.LOCAL\vpxd-extension-c9a92aa6-bb01-4303-980c-f90c5b6902da
--> session - 52f58a92-939e-471b-453e-cc536e02d0b9
--> IP - 192.168.111.144
--> Number of duplicates - 2]
2021-08-23T08:59:48.666Z info vpxd[16423] [Originator@6876 sub=vpxLro opID=wcp-vCLS-72] [VpxLRO] -- FINISH lro-54603
2021-08-23T08:59:48.669Z info vpxd[16350] [Originator@6876 sub=vpxLro opID=wcp-vCLS-95] [VpxLRO] -- BEGIN lro-54605 -- session[52f58a92-939e-471b-453e-cc536e02d0b9]52b5b642-0497-d99d-d1eb-d1d5daf30baf -- vim.view.View.destroy -- 52f58a92-939e-471b-453e-cc536e02d0b9(52b3c97e-8d94-d2ee-e946-468d0a170cc2)
2021-08-23T08:59:48.669Z info vpxd[16350] [Originator@6876 sub=vpxLro opID=wcp-vCLS-95] [VpxLRO] -- FINISH lro-54605
2021-08-23T08:59:48.691Z info vpxd[16337] [Originator@6876 sub=vpxLro opID=wcp-vCLS-9a] [VpxLRO] -- BEGIN lro-54610 -- VpxSettings -- vim.option.OptionManager.queryView -- 52f58a92-939e-471b-453e-cc536e02d0b9(52b3c97e-8d94-d2ee-e946-468d0a170cc2)
2021-08-23T08:59:48.691Z info vpxd[16337] [Originator@6876 sub=vpxLro opID=wcp-vCLS-9a] [VpxLRO] -- FINISH lro-54610
2021-08-23T08:59:48.691Z info vpxd[16337] [Originator@6876 sub=Default opID=wcp-vCLS-9a] [VpxLRO] -- ERROR lro-54610 -- VpxSettings -- vim.option.OptionManager.queryView: vim.fault.InvalidName:
--> Result:
--> (vim.fault.InvalidName) {
--> faultCause = (vmodl.MethodFault) null,
--> faultMessage = ,
--> name = "config.vcls.clusters.",
--> entity =
--> msg = ""
--> }
--> Args:
-->
--> Arg name:
--> "config.vcls.clusters."

esxi host log :
2021-08-23T09:00:12.784Z error vpxa[1050891] [Originator@6876 sub=PropertyCache opID=WFU-5da6b0c2 update=4528] Failed to diff 10:network, had ManagedObjectReference[], got ManagedObjectReference[]

@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.

@fuxiaoting
Copy link
Author

Screen Shot 2021-08-24 at 7 45 00 AM

Screen Shot 2021-08-24 at 7 44 45 AM

Environment information:
proxy: kubo/Ponies!23 10.191.166.95
epmc: root/Admin 192.168.111.200
vc: 192.168.111.144 administrator@vsphere.local
nsxt 192.168.111.89 admin/Admin!23Admin

@fuxiaoting
Copy link
Author

#dap-vcf-vmc-anycloud

@fuxiaoting
Copy link
Author

new environment:proxy:10.92.1.120 vc:192.168.111.61 nsxt:192.168.111.12 epmc:192.168.111.200

@ydp
Copy link

ydp commented Aug 25, 2021

Hi team, I had a initial investigation, added some code like below:

diff --git a/src/vsphere_cpi/lib/nsxt_policy_client/nsxt_policy_client/api/search_search_api_api.rb b/src/vsphere_cpi/lib/nsxt_policy_client/nsxt_policy_client/api/search_search_api_api.rb
index 6a3f07ae..866a5a64 100644
--- a/src/vsphere_cpi/lib/nsxt_policy_client/nsxt_policy_client/api/search_search_api_api.rb
+++ b/src/vsphere_cpi/lib/nsxt_policy_client/nsxt_policy_client/api/search_search_api_api.rb
@@ -163,6 +163,7 @@ module NSXTPolicy
       # http body (model)
       post_body = nil
       auth_names = ['BasicAuth']
+      @api_client.config.logger.info "API params: SearchSearchAPIApi #{query}"
       data, status_code, headers = @api_client.call_api(:GET, local_var_path,
         :header_params => header_params,
         :query_params => query_params,
@@ -170,9 +171,7 @@ module NSXTPolicy
         :body => post_body,
         :auth_names => auth_names,
         :return_type => 'SearchResponse')
-      if @api_client.config.debugging
-        @api_client.config.logger.debug "API called: SearchSearchAPIApi#query_search\nData: #{data.inspect}\nStatus code: #{status_code}\nHeaders: #{headers}"
-      end
+      @api_client.config.logger.info "API called: SearchSearchAPIApi#query_search\nData: #{data.inspect}\nStatus code: #{status_code}\nHeaders: #{headers}"
       return data, status_code, headers
     end
   end

Got some debug logs:

D, [2021-08-25T01:51:11.202220 #5214] DEBUG -- : Running method 'SetField'...
I, [2021-08-25T01:51:16.311961 #5214]  INFO -- : Networks found for VM: vm-3f06170f-b499-453f-9090-118e6545aa1b: ["DVSwitch: 50 31 7b 5e 06 00 42 c8-cb 42 c1 06 58 ea 10 b8"]
I, [2021-08-25T01:51:16.346874 #5214]  INFO -- : API params: SearchSearchAPIApi attachment.id:
I, [2021-08-25T01:51:16.405030 #5214]  INFO -- [req_id ]: Finished set_vm_metadata in 8.78 seconds
D, [2021-08-25T01:51:16.405450 #5214] DEBUG -- : Running method 'Logout'...
'
[cpiCmdRunner] 2021/08/25 01:51:16 DEBUG - Rescued Unknown: Invalid Query. backtrace: /root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/vsphere_cpi/lib/nsxt_policy_client/nsxt_policy_client/api_client.rb:68:in `call_api'
/root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/vsphere_cpi/lib/nsxt_policy_client/nsxt_policy_client/api/search_search_api_api.rb:167:in `query_search_with_http_info'
/root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/vsphere_cpi/lib/nsxt_policy_client/nsxt_policy_client/api/search_search_api_api.rb:111:in `query_search'
/root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/vsphere_cpi/lib/cloud/vsphere/nsxt_policy_provider.rb:142:in `block (2 levels) in update_vm_metadata_on_segment_ports'
/root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/vsphere_cpi/vendor/bundle/ruby/2.6.0/gems/bosh_common-1.3262.24.0/lib/common/retryable.rb:28:in `block in retryer'
/root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/vsphere_cpi/vendor/bundle/ruby/2.6.0/gems/bosh_common-1.3262.24.0/lib/common/retryable.rb:26:in `loop'
/root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/vsphere_cpi/vendor/bundle/ruby/2.6.0/gems/bosh_common-1.3262.24.0/lib/common/retryable.rb:26:in `retryer'
/root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/vsphere_cpi/lib/cloud/vsphere/nsxt_policy_provider.rb:141:in `block in update_vm_metadata_on_segment_ports'
/root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/vsphere_cpi/lib/cloud/vsphere/nsxt_policy_provider.rb:136:in `each'
/root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/vsphere_cpi/lib/cloud/vsphere/nsxt_policy_provider.rb:136:in `update_vm_metadata_on_segment_ports'
/root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/vsphere_cpi/lib/cloud/vsphere/helpers/hooks.rb:7:in `call'
/root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/vsphere_cpi/lib/cloud/vsphere/helpers/hooks.rb:7:in `block (2 levels) in before'
/root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/vsphere_cpi/lib/cloud/vsphere/cloud.rb:585:in `block in set_vm_metadata'
/root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/vsphere_cpi/vendor/bundle/ruby/2.6.0/gems/bosh_common-1.3262.24.0/lib/common/thread_formatter.rb:49:in `with_thread_name'
/root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/vsphere_cpi/lib/cloud/vsphere/cloud.rb:578:in `set_vm_metadata'
/root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/ruby-2.6.5-r0.29.0/lib/ruby/2.6.0/forwardable.rb:230:in `set_vm_metadata'
/root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/vsphere_cpi/vendor/bundle/ruby/2.6.0/gems/bosh_cpi-2.5.0/lib/bosh/cpi/cli.rb:89:in `public_send'
/root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/vsphere_cpi/vendor/bundle/ruby/2.6.0/gems/bosh_cpi-2.5.0/lib/bosh/cpi/cli.rb:89:in `run'
/root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/vsphere_cpi/bin/vsphere_cpi:69:in `<top (required)>'
/root/.bosh/installations/e5a5c632-53a8-42c8-5fbd-a9421f92892c/packages/ruby-2.6.5-r0.29.0/lib/ruby/site_ruby/2.6.0/bundler/cli/exec.rb:63:in `load'

It could be seen that the attachment_id is nil

segment_port_search_result = search_api.query_search("attachment.id:#{attachment_id}").results[0]

And the code in set_vm_meta goes to update_vm_metadata_on_segment_ports, but there is no segment port here, actually it's vss/vds port group port. As @fuxiaoting described in the title: enabled nsx-t policy with No-NAT with Virtual Switch (VSS/VDS) Topology, @julian-hj could you please help confirm whether this topology is supported? If it's not, do we have plan to support it? Thanks.

@julian-hj
Copy link
Member

Looking into the code, it does appear that this use case is supported by the manager API code, but broken in the policy API code. When the manager API code finds a port with VDS backing, it does additional checking to make sure it is managed by NSXT. The corresponding Policy API code appears to just assume that the port is managed by NSXT without checking

We should fix this.

@julian-hj julian-hj self-assigned this Aug 25, 2021
@julian-hj julian-hj added the Bug label Aug 25, 2021
cunnie pushed a commit that referenced this issue Aug 26, 2021
Previously the vSphere CPI NSX-T Policy API assumed _all_ Distributed
Virtual Switches (DVSes) were NSX-T Segments, but that assummption was
false; DVSes may be managed by vSphere and not NSX-T.

We now check whether the network interface controller (NIC) is
NSX-T-managed before attempting to tag the NIC's port with NSX-T
metadata.

fixes:
```
unable to create bosh vm:CPI 'set_vm_metadata' method responded with error: CmdError{"type":"Unknown","message":"Invalid Query","ok_to_retry":false}
```

[fixes #302]
[#179334136](https://www.pivotaltracker.com/story/show/179334136)

Signed-off-by: Brian Cunnie <bcunnie@vmware.com>
cunnie pushed a commit that referenced this issue Aug 26, 2021
Previously the vSphere CPI NSX-T Policy API assumed _all_ Distributed
Virtual Switches (DVSes) were NSX-T Segments, but that assummption was
false; DVSes may be managed by vSphere and not NSX-T.

We now check whether the network interface controller (NIC) is
NSX-T-managed before attempting to tag the NIC's port with NSX-T
metadata.

fixes:
```
unable to create bosh vm:CPI 'set_vm_metadata' method responded with error: CmdError{"type":"Unknown","message":"Invalid Query","ok_to_retry":false}
```

[fixes #302]
[#179334136](https://www.pivotaltracker.com/story/show/179334136)

Signed-off-by: Brian Cunnie <bcunnie@vmware.com>
rkoster pushed a commit to svrc/bosh-vsphere-cpi-release that referenced this issue Jul 15, 2024
Previously the vSphere CPI NSX-T Policy API assumed _all_ Distributed
Virtual Switches (DVSes) were NSX-T Segments, but that assummption was
false; DVSes may be managed by vSphere and not NSX-T.

We now check whether the network interface controller (NIC) is
NSX-T-managed before attempting to tag the NIC's port with NSX-T
metadata.

fixes:
```
unable to create bosh vm:CPI 'set_vm_metadata' method responded with error: CmdError{"type":"Unknown","message":"Invalid Query","ok_to_retry":false}
```

[fixes cloudfoundry#302]
[#179334136](https://www.pivotaltracker.com/story/show/179334136)

Signed-off-by: Brian Cunnie <bcunnie@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants