Skip to content

We should not fail fast in our e2e test suite #6338

Open

Description

The last release showed that our current approach of stopping the test run when a test fails can hide important other test failures. We should change the test runner to not fail fast but instead keep running additional tests.

Some of the things we have to be aware of and might need to address in the test runner:

  • resource cleanup: I believe created resources are not deleted when a test fails this can lead to resource exhaustion and subsequent test failures that are not actually related to bugs in the product but
  • test interdependencies: we discussed concerns about the license tests causing problems, however after looking at them I don't see an issue as long as we don't parallelise the tests (which we currently have not intention to)

I assume part of the motivation to use failfast was to keep the feedback cycle short and not have to wait for hours and hours of tests before we get go/no-go signal. But I think we have since changed the way we use the e2e tests quite a bit. We now use a much short smoke test for the PR builds which is effectively a single test. The full e2e test suite is used over night or after integration into main where turnaround time is less of a concern. We also have the proposal to add new e2e test pipeline that is a bit more comprehensive than the smoke test but covers the most important operations namely version upgrades of the stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

>testRelated to unit/integration/e2e tests

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions