Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fence_nutanix_ahv: Add fence agent support for Nutanix AHV Cluster #600

Merged
merged 5 commits into from
Nov 8, 2024

Conversation

nxgovind
Copy link
Contributor

@nxgovind nxgovind commented Nov 5, 2024

This patch adds fence agent support for Nutanix AHV clusters. More specifically the initial support is aimed at AHV clusters that support Nutanix v4 APIs. V3 APIs are not supported.

Signed off by amir.eibagi@nutanix.com

This patch adds fence agent support for Nutanix AHV clusters.
More specifically the initial support is aimed at AHV clusters
that support Nutanix v4 APIs. V3 APIs are not supported.

Signed off by <amir.eibagi@nutanix.com>
Copy link

knet-jenkins bot commented Nov 5, 2024

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-600/1/input

Copy link
Collaborator

@oalbrigt oalbrigt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general.

I've added some suggestions for improvements and fixes.

You should name the agent fence_nutanix_ahv to make it easy for users to identify that it might be the agent they're looking for without looking at the help text or manpage.

You have to run make xml-upload and add the metadata to the PR.

The agent should be added to it's own %package/%description/%files sections in fence-agents.spec.in.

agents/ahv/fence_ahv.py Outdated Show resolved Hide resolved
agents/ahv/fence_ahv.py Outdated Show resolved Hide resolved
agents/ahv/fence_ahv.py Outdated Show resolved Hide resolved
agents/ahv/fence_ahv.py Outdated Show resolved Hide resolved
agents/ahv/fence_ahv.py Outdated Show resolved Hide resolved
agents/ahv/fence_ahv.py Outdated Show resolved Hide resolved
agents/ahv/fence_ahv.py Outdated Show resolved Hide resolved
agents/ahv/fence_ahv.py Outdated Show resolved Hide resolved
agents/ahv/fence_ahv.py Outdated Show resolved Hide resolved
agents/ahv/fence_ahv.py Outdated Show resolved Hide resolved
Copy link

knet-jenkins bot commented Nov 5, 2024

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-600/2/input

This patch adds fence agent support for Nutanix AHV clusters.
More specifically the initial support is aimed at AHV clusters
that support Nutanix v4 APIs. V3 APIs are not supported.

Signed off by <amir.eibagi@nutanix.com>
@nxgovind nxgovind force-pushed the nutanix_fence_agent branch from 3f9d80b to edb3ba3 Compare November 6, 2024 00:01
Copy link

knet-jenkins bot commented Nov 6, 2024

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-600/3/input

This patch adds fence agent support for Nutanix AHV clusters.
More specifically the initial support is aimed at AHV clusters
that support Nutanix v4 APIs. V3 APIs are not supported.

Signed off by <amir.eibagi@nutanix.com>
Copy link

knet-jenkins bot commented Nov 6, 2024

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-600/4/input

@nxgovind
Copy link
Contributor Author

nxgovind commented Nov 6, 2024

@oalbrigt Thank you for your review comments. I have addressed all of them. Please let me know if I have missed anything else.

@nxgovind nxgovind changed the title Add fence agent support for Nutanix AHV Cluster fence_nutanix_ahv: Add fence agent support for Nutanix AHV Cluster Nov 6, 2024
Copy link
Collaborator

@oalbrigt oalbrigt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few improvements needed.

agents/nutanix_ahv/fence_nutanix_ahv.py Outdated Show resolved Hide resolved
agents/nutanix_ahv/fence_nutanix_ahv.py Outdated Show resolved Hide resolved
agents/nutanix_ahv/fence_nutanix_ahv.py Outdated Show resolved Hide resolved
agents/nutanix_ahv/fence_nutanix_ahv.py Outdated Show resolved Hide resolved
This patch adds fence agent support for Nutanix AHV clusters.
More specifically the initial support is aimed at AHV clusters
that support Nutanix v4 APIs. V3 APIs are not supported.

Signed off by <amir.eibagi@nutanix.com>
Copy link

knet-jenkins bot commented Nov 6, 2024

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-600/5/input

This patch adds fence agent support for Nutanix AHV clusters.
More specifically the initial support is aimed at AHV clusters
that support Nutanix v4 APIs. V3 APIs are not supported.

Signed off by <amir.eibagi@nutanix.com>
Copy link

knet-jenkins bot commented Nov 6, 2024

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-600/6/input

@oalbrigt
Copy link
Collaborator

oalbrigt commented Nov 7, 2024

retest this please

@nxgovind
Copy link
Contributor Author

nxgovind commented Nov 7, 2024

Thank you for your review. I ran a few tests with the latest changes on a 3-node CentOS 9 stream, cluster setup. All basic power operations via fence_nutanix_ahv works fine. Also, tested stonith feature by failing a node to confirm that pacemaker successfully resets the failed node. Documenting the test output here.

[root@vm2-auto ~]# fence_nutanix_ahv -a 10.101.63.173 -l admin -p Nutanix.123 -o list-status --ssl-insecure
TestVM1,8e1353ff-59a8-4683-af08-293036f08d4f,OFF
TestVM2,bda19034-c121-430f-a70c-a872f9dbabf7,OFF
Node 1,ae94f8c2-96f1-4c85-bb4a-b1cbd48aeee8,ON
Node 2,bdd08b08-d11d-41b8-b59f-8f8ba77d9ae6,ON
Node 3,c2c4f047-9a56-460f-9bbb-d7f6d81a2e0c,ON

[root@vm2-auto ~]# fence_nutanix_ahv -a 10.101.63.173 -l admin -p Nutanix.123 -o list-status --filter="name eq 'TestVM1'" --ssl-insecure
TestVM1,8e1353ff-59a8-4683-af08-293036f08d4f,OFF

[root@vm2-auto ~]# fence_nutanix_ahv -a 10.101.63.173 -l admin -p Nutanix.123 -o on --plug='TestVM1' --ssl-insecure
Success: Powered ON

[root@vm2-auto ~]# fence_nutanix_ahv -a 10.101.63.173 -l admin -p Nutanix.123 -o reboot --plug='TestVM1' --ssl-insecure
Success: Rebooted

[root@vm2-auto ~]# fence_nutanix_ahv -a 10.101.63.173 -l admin -p Nutanix.123 -o list-status --filter="startswith(name, 'TestVM')" --ssl-insecure
TestVM1,8e1353ff-59a8-4683-af08-293036f08d4f,ON
TestVM2,bda19034-c121-430f-a70c-a872f9dbabf7,ON

[root@vm2-auto ~]# fence_nutanix_ahv -a 10.101.63.173 -l admin -p Nutanix.123 -o off --plug='TestVM1' --ssl-insecure
Success: Powered OFF

[root@vm2-auto ~]# fence_nutanix_ahv -a 10.101.63.173 -l admin -p Nutanix.123 -o list-status --filter="startswith(name, 'TestVM')" --ssl-insecure
TestVM1,8e1353ff-59a8-4683-af08-293036f08d4f,OFF
TestVM2,bda19034-c121-430f-a70c-a872f9dbabf7,OFF

tail -f /var/log/pacemaker/pacemaker.log
Nov 07 11:07:40.933 node1 pacemaker-fenced [1134] (log_async_result) notice: Operation 'reboot' [1493] targeting node2 using nutanix_fence returned 0 | call 13 from pacemaker-controld.1382
Nov 07 11:07:40.964 node1 pacemaker-fenced [1134] (finalize_op) notice: Operation 'reboot' targeting node2 by node1 for pacemaker-controld.1382@node3: OK (complete) | id=2e28a260
Nov 07 11:07:40.965 node1 pacemaker-controld [1138] (handle_fence_notification) notice: Peer node2 was terminated (reboot) by node1 on behalf of pacemaker-controld.1382@node3: OK | event=2e28a260-bdd1-4154-b98b-1fd14227dc63

[root@node1 ~]# pcs status
Cluster name: ha_cluster
Cluster Summary:

  • Stack: corosync (Pacemaker is running)
  • Current DC: node3 (version 2.1.8-3.el9-3980678f0) - partition with quorum
  • Last updated: Thu Nov 7 11:09:22 2024 on node1
  • Last change: Thu Nov 7 11:06:03 2024 by root via root on node1
  • 3 nodes configured
  • 2 resource instances configured

Node List:

  • Online: [ node1 node2 node3 ]

Full List of Resources:

  • nutanix_fence (stonith:fence_nutanix_ahv): Started node1
  • shared_ip (ocf:heartbeat:IPaddr2): Started node3

@nxgovind
Copy link
Contributor Author

nxgovind commented Nov 7, 2024

@oalbrigt I have run some basic tests, including cluster node failure test. Please merge the pull request if you are comfortable with the tests.

@oalbrigt oalbrigt merged commit 05fd299 into ClusterLabs:main Nov 8, 2024
1 check passed
@oalbrigt
Copy link
Collaborator

oalbrigt commented Nov 8, 2024

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants