Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add drop cache when change nodes state #1947

Closed
wants to merge 1 commit into from

Conversation

bkopilov
Copy link
Contributor

Current iptables fixture tears down nodes in order modify iptables. When the machines started again the cached info must be refreshed.

In case machines state changed like stop/reboot cashed info may hide bugs in the node controllers or fail tests.

@openshift-ci openshift-ci bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Dec 12, 2022
@openshift-ci
Copy link

openshift-ci bot commented Dec 12, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bkopilov
Once this PR has been reviewed and has the lgtm label, please assign eranco74 for approval by writing /assign @eranco74 in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 12, 2022
@openshift-ci
Copy link

openshift-ci bot commented Dec 12, 2022

Hi @bkopilov. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bkopilov
Copy link
Contributor Author

@osherdp @lalon4

Please review

@adriengentil
Copy link
Contributor

/ok-to-test

Would it make sense to drop the cache directly in shutdown_given and shutdown_all? So everytime shutdown is called, the cache would be dropped and potentially prevent potential discrepancies in other parts of test-infra.

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 12, 2022
@bkopilov
Copy link
Contributor Author

bkopilov commented Dec 12, 2022

/ok-to-test

Would it make sense to drop the cache directly in shutdown_given and shutdown_all? So everytime shutdown is called, the cache would be dropped and potentially prevent potential discrepancies in other parts of test-infra.

[bkopilov] Yes , the cache is a reall risk if something goes wrong during tests or events.

@@ -644,7 +644,8 @@ def set_iptables_rules_for_nodes(
for node in given_nodes:
given_node_ips.append(node.ips[0])
cluster.nodes.shutdown_given(given_nodes)

# Clear cached info about nodes after shutdown
cluster.nodes.drop_cache()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to do the shutdown in the first place? Any chance you know?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iptables fixture from base_test.py run shutdown, no idea if we really need shutdown.
We have failures in CI in tests that use this fixture and i suspect that when the libvirt comes back we dont have the latest ip.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let's try to remove the shutdown if it serves no purpose
There's no reason to just do more actions and increase our chances of failure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you recommend to fix all other cases when we stop/ reboot nodes during test lifetime ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can drop the cache for those other cases (which are outside of this code scope)
Can you send me some links? I'll take a look at those tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, lets not remove the shutdown, it was implemented long time ago, there was a reason - if i am not mistaken if for the rules to apply, proxy tests are highly depended on it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check this a bit
Either way Adrien's suggestion to embed the cache-drop in the shutdown function is a better way to construct this solution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack , yes Adrien's change will cover our needs but please note that we still may have an issue if user will try to call to Node().shutdown() . but it good enough ,if no objection i am going to drop_cahce for: shutdown and reboot

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good compromise

Iptables check does not work when having a chain of sources
When we insert a rule with multiple source, iptables split the list
and adding each source with same conditions

On check action iptables wont catch case when having a mix of source
when 1 is not in the table. looks like iptables check only one.
In case check action returned wrong result add/delete rule chnaged and
may trigger failures due to unexpected results

Also deletion , iptables delete only the first match when having duplicate
rules.

Example:
iptables --insert FORWARD -p tcp -j DROP -s 192.168.128.10,192.168.128.13
results with:
-A FORWARD -s 192.168.128.13/32 -p tcp -m tcp --dport 80 -j DROP
-A FORWARD -s 192.168.128.10/32 -p tcp -m tcp --dport 80 -j DROP

We will get different result from iptables for:
iptables --check FORWARD -p tcp -j DROP -s 7.7.7.7,192.168.128.10
iptables --check FORWARD -p tcp -j DROP -s 192.168.128.10,7.7.7.7

Means that only 1 is the best approch here.
Run delete rules till all duplicate rules gone
@bkopilov bkopilov force-pushed the fix_iptables_fixture branch from 40036ce to 562ebc0 Compare December 15, 2022 08:33
@openshift-ci openshift-ci bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 15, 2022
@openshift-ci
Copy link

openshift-ci bot commented Dec 15, 2022

@bkopilov: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/lint 562ebc0 link true /test lint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@osherdp
Copy link
Contributor

osherdp commented Dec 15, 2022

Code has changed into another change
For now I'm waiting for it to go back to original requested change
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 15, 2022
bkopilov added a commit to bkopilov/assisted-test-infra that referenced this pull request Dec 15, 2022
When the machines started again the cached info must be refreshed.

In case machines state changed like stop/reboot cashed info may
hide bugs in the node controllers or fail tests.

This patch is continuation of openshift#1947
Due to some mismach i created new patch.
@osherdp
Copy link
Contributor

osherdp commented Dec 18, 2022

@bkopilov can we close this PR?
I see there's a replacement

@bkopilov
Copy link
Contributor Author

@bkopilov can we close this PR? I see there's a replacement

Yes please .

@osherdp
Copy link
Contributor

osherdp commented Dec 18, 2022

/close

@openshift-ci openshift-ci bot closed this Dec 18, 2022
@openshift-ci
Copy link

openshift-ci bot commented Dec 18, 2022

@osherdp: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-merge-robot pushed a commit that referenced this pull request Dec 18, 2022
…1954)

When the machines started again the cached info must be refreshed.

In case machines state changed like stop/reboot cashed info may
hide bugs in the node controllers or fail tests.

This patch is continuation of #1947
Due to some mismach i created new patch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants