Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: crmsh enhancements, master slave, validations #197

Merged
merged 6 commits into from
Apr 9, 2024

Conversation

marcelmamula
Copy link
Contributor

Enhancement:

  • Master Slave clone resource added for crmsh to support retroactively older versions. Readme updated with new inputs.
  • New validations added for CIB steps to ensure that constraints are created on top of resources that exist, to avoid crmsh getting stuck.
  • Expect for maintenance mode switch was added to handle resource attributes.
  • New naming convention for resources, because crmsh does not generate default without providing id.
  • cluster_destroy: file based loop replaced with shell module for faster cleanup.

Reason:

  • Support for current=older installations using SAPHanaSR resource agent with Master Slave clone resource
  • Validations to avoid crmsh getting stuck

Result:

  • Cluster build was successfully tested on SLES4SAP15 SP3/5/6 and RHEL8.6
  • All non-readme changes were done under crmsh so pcs is unaffected.

Issue Tracker Tickets (Jira or BZ if any):

Copy link

codecov bot commented Mar 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.79%. Comparing base (620f0dd) to head (131c074).
Report is 4 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main     #197       +/-   ##
===========================================
+ Coverage   50.54%   76.79%   +26.24%     
===========================================
  Files           1        3        +2     
  Lines          91      181       +90     
  Branches       12       12               
===========================================
+ Hits           46      139       +93     
+ Misses         45       42        -3     
Flag Coverage Δ
sanity 50.54% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

# Verify if constraint.resource_leader.id exists
- name: Verify resource_leader presence {{ constraint.resource_leader.id }}
ansible.builtin.command:
cmd: |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cmd: |
cmd: >-

? That is, should the command be one line crm -c {{ __ha_cluster_crm_shadow }} configure show {{ constraint.resource_leader.id }} ? because the yaml string | operator will preserve newlines, so the command will have newlines in it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richm I have run it with pipes too many times and it never resulted in error, crmsh was able to handle multi line input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richm I have changed all pipes where it was necessary, change is in latest commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richm I have changed pipe to >- where it was possible and I am keeping it for multi-line commands only. Can we close pipe discussion?

@marcelmamula
Copy link
Contributor Author

@tomjelinek @richm Master/Slave is valid naming for cluster resources, removing them as per "anti-woke" filter would not make sense.
https://crmsh.github.io/man-4.3/#cmdhelp_configure_ms

Copy link
Member

@tomjelinek tomjelinek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Master / slave resources have been deprecated in Pacemaker 2.0 released in July 2018 and replaced by promotable clone resources. In Pacemaker 3, which is expected to be released this year, they are going to be removed completely. I see no reason for adding a special variable for defining those.

The role already supports configuring promotable clones by the means of promotable option in ha_cluster_resource_clones. What would be a difference between a master_slave_clone and a promotable clone? I don't see any difference in the variable structure, except the promotable key. It looks to me like you are creating a new variable for something that can be defined using already existing variables.

If for whatever reason you need to create master / slave resources in pacemaker, I suggest using clones with promotable: true to define those. It would be then up to the crmsh portion of the role or the crmsh itself to decide if a promotable clone or a master / slave should be created based on pacemaker or crmsh version. That way, it will be transparent for the role users and the same playbook could be used for both old and new versions of pacemaker as well as both pcs-oriented and crmsh-oriented distros. Such interoperability is one of the goals of the system roles.

@marcelmamula
Copy link
Contributor Author

marcelmamula commented Mar 27, 2024

Master / slave resources have been deprecated in Pacemaker 2.0 released in July 2018 and replaced by promotable clone resources. In Pacemaker 3, which is expected to be released this year, they are going to be removed completely. I see no reason for adding a special variable for defining those.

The role already supports configuring promotable clones by the means of promotable option in ha_cluster_resource_clones. What would be a difference between a master_slave_clone and a promotable clone? I don't see any difference in the variable structure, except the promotable key. It looks to me like you are creating a new variable for something that can be defined using already existing variables.

If for whatever reason you need to create master / slave resources in pacemaker, I suggest using clones with promotable: true to define those. It would be then up to the crmsh portion of the role or the crmsh itself to decide if a promotable clone or a master / slave should be created based on pacemaker or crmsh version. That way, it will be transparent for the role users and the same playbook could be used for both old and new versions of pacemaker as well as both pcs-oriented and crmsh-oriented distros. Such interoperability is one of the goals of the system roles.

@tomjelinek Master Slave resources were indeed deprecated, but SUSE still uses Master Slave for existing clusters. It is replaced by Clone promotable in new Resource Agent SAPHanaSR-angi, but all current SAPHanaSR implementations are documented and recommended to use Master Slave.

This is added to ensure proper support coverage and it has no impact on pcs.

@tomjelinek
Copy link
Member

tomjelinek commented Mar 27, 2024

@tomjelinek Master Slave resources were indeed deprecated, but SUSE still uses Master Slave for existing clusters. It is replaced by Clone promotable in new Resource Agent SAPHanaSR-angi, but all current SAPHanaSR implementations are documented and recommended to use Master Slave.

This is added to ensure proper support coverage and it has no impact on pcs.

@marcelmamula I'm not arguing that master / slave resources do not need to be supported. My point is different: What is preventing you to use already existing promotable: true syntax for defining master / slave resources? Something like this:

crm configure {{ resource.promotable | ternary("ms", "clone") }} {{ resource.id }} ...

I don't see any other difference between promotable clones and master / slave resources than this.

@marcelmamula
Copy link
Contributor Author

@tomjelinek Master Slave resources were indeed deprecated, but SUSE still uses Master Slave for existing clusters. It is replaced by Clone promotable in new Resource Agent SAPHanaSR-angi, but all current SAPHanaSR implementations are documented and recommended to use Master Slave.
This is added to ensure proper support coverage and it has no impact on pcs.

@marcelmamula I'm not arguing that master / slave resources do not need to be supported. My point is different: What is preventing you to use already existing promotable: true syntax for defining master / slave resources? Something like this:

crm configure {{ resource.promotable | ternary("ms", "clone") }} {{ resource.id }} ...

I don't see any other difference between promotable clones and master / slave resources than this.

@tomjelinek My idea was to have new code for different type of resource and have it documented.
But your proposal makes sense. I will try to redo it tomorrow and see how it behaves.

@marcelmamula
Copy link
Contributor Author

@tomjelinek I have looked into variations we have coming for SAP Installation and there are few:
clone without promotable defined (false by default) = clone
clone with promotable true = clone
master slave without promotable defined = ms

I dont believe promotable attribute can be used as deciding factor by itself. Maybe introduce dictionary parameter under clone that I could use, like "ms" or "master". pcs would not need to be adjusted, I would just pick if it is defined or not.

@marcelmamula
Copy link
Contributor Author

@tomjelinek I can simplify this msl scenario with hidden attribute under clone dictionary, that would determine ms/clone conditional. There is currently no difference for 2 of our clones in terms of promotable, as it is undefined so it cannot be used for decision. I would love to avoid using id as decision factor.

it can be hidden atribute like ms, master, msl as boolean.

ha_cluster_resource_master_slave_clones:
  - id: msl_SAPHana_H00_HDB00
    resource_id: rsc_SAPHana_H00_HDB00
    meta_attrs:
      - attrs:
          - name: clone-max
            value: 2
          - name: clone-node-max
            value: 1
          - name: interleave
            value: true
          - name: maintenance
            value: true
    ms: true

@marcelmamula
Copy link
Contributor Author

@tomjelinek master slave code is removed and existing clone is adjusted to expect clone attribute ms: true for this specific use case. It would stay with default clone if it is not used.

configure delete {{ __ha_cluster_resource_id }}
when: __ha_cluster_resource_status.rc == 0
check_mode: false
changed_when: not ansible_check_mode

# Clone is default resource type, unless attribute ms: true is specified
# Pacemaker 2.0 deprecated use of ms, but it is still valid and supported
# solution for SAP Hana clusters on SUSE using SAPHanaSR.
- name: Configure resource clone {{ __ha_cluster_resource_id }}
ansible.builtin.command:
cmd: |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cmd: |
cmd: >-

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richm I have changed pipe to >- where it was possible and I am keeping it for multi-line commands only. Can we close pipe discussion?

@@ -47,5 +49,7 @@
{% endif %}
{% endfor %}
{% endif %}
# crm can get stuck if it encounters error and expects prompt response.
timeout: 60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this? Where is the behavior of the timeout keyword documented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richm Timeout is ansible task parameter. I have added it because crmsh gets stuck if it encounters error and asks for user input. This is handled with expect in constraint tasks, while keeping resource tasks with timeout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I see - https://docs.ansible.com/ansible/latest/reference_appendices/config.html#task-timeout
"Version Added: 2.10"

Note that for RedHat os_family, we still have to support Ansible 2.9, and still have many customers using 2.9, until late 2024 when 2.9 support will be ended. Not sure if you have the same restrictions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richm I was doing testing on my execution node with 2.16 so I was not checking version of it.
It seemed to fulfill task of making sure it does not hang forever. Only other way I see to handle it would be using async and then async_status instead of it.

Do you know of other way to ensure that command/shell does not get stuck like this? I dont see any option with crmsh to skip or auto-decline user inputs without using command/shell with echo pipe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richm I was doing testing on my execution node with 2.16 so I was not checking version of it. It seemed to fulfill task of making sure it does not hang forever. Only other way I see to handle it would be using async and then async_status instead of it.

Do you know of other way to ensure that command/shell does not get stuck like this? I dont see any option with crmsh to skip or auto-decline user inputs without using command/shell with echo pipe.

I would say - use timeout: unless you are sure that you will have some customers using Ansible 2.9
If you must support Ansible 2.9, then you will need to use async

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richm I have removed timeout and replaced it with expect module with response y/n and ERROR.

@marcelmamula
Copy link
Contributor Author

@richm @tomjelinek I will be pushing one more commit with new functionality that was on todo list for crmsh: rsc/op defaults.

@marcelmamula
Copy link
Contributor Author

@tomjelinek @richm Let me know if all your concerns were covered with corrective commits. Pipes were replaced where possible, master slave code was removed and supplemented by ms: true switch, rsc/op code was added.

@tomjelinek
Copy link
Member

[citest]

Copy link
Member

@tomjelinek tomjelinek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating the PR, it looks good to me now. Let's wait for the CI.

@richm richm merged commit e42be8a into linux-system-roles:main Apr 9, 2024
27 of 30 checks passed
@marcelmamula marcelmamula deleted the multi-state branch April 24, 2024 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants