-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor and fix all upgrade integration tests #3477
Conversation
This has a replace present until this PR is merged and new elastic-agent-libs is released - elastic/elastic-agent-libs#152 |
🌐 Coverage report
|
https://github.com/elastic/elastic-agent/blob/main/sonar-project.properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far, want to see the tests passing before approving.
Tests are much easier to follow like this (at least to me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is at least 1 test that has changed meaning in the refactor: TestStandaloneDowngradeToPreviousSnapshotBuild
is supposed to test the upgrade in form of elastic-agent upgrade x.y.z-SNAPSHOT+<buildid>
and from what I see it doesn't do that anymore but it just performs a downgrade 2 minors back
There are a few other comments on other parts of the code but another element that pops is that we use version.GetPreviousMinor()
more even if we know that it has some limitation as it tries to guess what the previous minor would be.
Maybe you want to take a look at #3458 (which I imagine is now replaced by this) for a possible alternative of selecting previous versions
The modreplace has been removed but the lint is still failing, not clear as to why. |
…s are different and the commits are the same.
800e598
to
a264c09
Compare
@ycombinator Don't worry about it. I have fixed the conflict and updated the test to use the new patterns. |
@pchila I believe this is ready for a final review. I have fixed all conflicts and issues you pointed out. |
This pull request is now in conflicts. Could you fix it? 🙏
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TestStandaloneDowngradeToSpecificSnapshotBuild
looks better but we need to select the penultimate build to have a result that is noticeably different from a normal upgrade to a snapshot version
@pchila I have fixed that test and fixed the conflicts. @ycombinator I have fixed the two tests in conflict if you want to give them a look over. Both where new tests that you added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// before trying to perform another upgrade | ||
err = upgradetest.WaitHealthyAndVersion(ctx, startFixture, endVersionInfo.Binary, 2*time.Minute, 10*time.Second, t) | ||
if err != nil { | ||
// agent never got healthy, but we need to ensure the watcher is stopped before continuing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't immediately obvious to me why the watcher was being killed here. I think it's so it doesn't interfere with other tests (as opposed to having anything to do with continuing on in this test, since this test should now fail due to the require.NoError(t, err)
on line 79), right? Could you clarify that in the code comment here please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added more to the comment to make this clear. The reason you gave is the correct.
// killTimeout is greater than timeout as the watcher should have been | ||
// stopped on its own, and we don't want this test to hide that fact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
// Start at the build version as we want to test the retry | ||
// logic that is in the build. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is not relevant to this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is relevant.
// agent never got healthy, but we need to ensure the watcher is stopped before continuing | ||
// this kills the watcher instantly and waits for it to be gone before continuing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as https://github.com/elastic/elastic-agent/pull/3477/files#r1341596538 about the "continuing" being unclear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
// killTimeout is greater than timeout as the watcher should have been | ||
// stopped on its own, and we don't want this test to hide that fact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@blakerouse I reviewed the two tests. The logic in both looks good and even improved towards the end of the tests! I left some feedback about clarifying comments, that's all. |
SonarQube Quality Gate 0 Bugs No Coverage information |
@blakerouse If we have a snapshot with a buildId we need to prepare the uri in the format of x.y.z-, for example From the error in https://buildkite.com/elastic/elastic-agent/builds/3611 it seems the artifact fetcher is not resolving the uri correctly |
* Fix all upgrade tests. * Fix imports and headers. * Update notice. * Exclude testing/** from sonar. * Fix comments from code review. * Add extra error information in the artifact fetcher. * Fixes from code review. * Add upgrade uninstall kill watcher test. * Remove go replace. Regenerate notice. Fix lint. * Import WithSourceURI logic. Fix fleet test to not skip if the versions are different and the commits are the same. * More test fixes. * Fix imports. * Re-add TestStandaloneUpgradeFailsWhenUpgradeIsInProgress. Fix code review. (cherry picked from commit 6201e19) # Conflicts: # sonar-project.properties # testing/integration/upgrade_test.go
…#3495) * Refactor and fix all upgrade integration tests (#3477) * Fix all upgrade tests. * Fix imports and headers. * Update notice. * Exclude testing/** from sonar. * Fix comments from code review. * Add extra error information in the artifact fetcher. * Fixes from code review. * Add upgrade uninstall kill watcher test. * Remove go replace. Regenerate notice. Fix lint. * Import WithSourceURI logic. Fix fleet test to not skip if the versions are different and the commits are the same. * More test fixes. * Fix imports. * Re-add TestStandaloneUpgradeFailsWhenUpgradeIsInProgress. Fix code review. (cherry picked from commit 6201e19) # Conflicts: # sonar-project.properties # testing/integration/upgrade_test.go * Fix merge. --------- Co-authored-by: Blake Rouse <blake.rouse@elastic.co>
What does this PR do?
This refactors and fixes all upgrade integration tests. This provides a unified path for the tests to follow for testing upgrade. Includes code to make the upgrades faster, and allows downgrades to work. Includes fixes for upgrade tests that where skipped because they where flaky and enables the retry upgrade test on Windows.
Why is it important?
Ensures that all upgrade tests are working properly so we can catch real errors instead of fighting the CI.
Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration files[ ] I have added an entry in./changelog/fragments
using the changelog toolHow to test this PR locally
mage integration:test
Related issues