Rethink criteria for emitting docs, publishing Docker images and deploying staging #1851

dhruvkb · 2023-04-20T12:09:52Z

dhruvkb
Apr 20, 2023
Maintainer

Problem

Currently the criteria for emitting docs, publishing Docker images and deploying to staging is as follows:

Source: https://docs.openverse.org/meta/ci_cd/proof_of_functionality.html

Description

This approach was taken because

some tests like the API integration test and the ingestion server integration test are fairly comprehensive and good enough on their own; This makes the other jobs like unit tests kind of redundant.
Playwright tests are extremely flaky so relying on them for all outputs would cause lots of failures; The idea was deployments can be skipped sometimes till the next frontend PR is merged.

This is not 100% ideal because

it allows some tests to fail and still not block the process from proceeding
(specifically for the frontend) images can be published to GHCR but not be deployed if Playwright tests fail

Discussion

The goal is to determine answers to these questions:

Is this approach of selecting key jobs okay?
- If not, should we add more jobs as requirements for these built outputs?
  - If yes, what jobs should be considered as adequate proof-of-functionality?
Should all build outputs depend on the same set of jobs?

Reference

This concern was raised by @sarayourfriend in a review for #1001.

Answered by dhruvkb

Jun 1, 2023

PR #2275 is up.

View full answer

sarayourfriend · 2023-04-20T20:12:30Z

sarayourfriend
Apr 20, 2023
Collaborator

Things to consider:

API and ingestion server unit tests are fast. They wouldn't add much more than a minute. Hardly worth skipping, in my opinion, if we assume they're actually useful tests.

While Playwright tests are flaky, frontend unit tests should not be. And frontend unit tests, when well written, exercise the core functionality of our components through semi-real user interaction via testing-library. These should not be skipped. They do not take that much time to run and would not delay deployment more than a couple of minutes.

That being said, I don't think skipping Playwright tests is good either, assuming again that we wrote those tests for a reason and that they validate something real. If a particular test is flaky then we should fix it ASAP or skip it or even remove it if we don't think it's a useful test (see #1824).

In short: why have tests if we don't rely on them to verify the application works before deploying it?

Additionally: not deploying a given staging PR implies that people are not testing their changes in staging right away. That reduces confidence in anyone who subsequently wants to deploy production. Is the assumption that features in staging may be untested in a live environment? That's not a good assumption and I personally do not want to be responsible for deploying code written by others, reviewed by others, but then subsequently untested by anyone in a live environment. If that release to production starts to fail, suddenly I'm responsible for rolling back code that was never tested in a live environment. Rolling back already isn't fun, but rolling back preventable errors is even less fun.

Furthermore, if staging sometimes randomly doesn't deploy even though the PR passed, and that particular PR that was merged was a critical PR that calls for an immediate production deployment after verification in staging, we've just put ourselves in a tidy little corner. We'd need to manually deploy staging to the new image.

Anyway, that's to say: if our test suite is flaky, that is a critical issue that demands immediate attention. Unless the test suite isn't fully useful. And if that's the case, then we should delete the tests. Alternatively, if they're unfixably flaky but still useful, run them as "nightly" tests with plenty of retries.

0 replies

AetherUnbound · 2023-05-09T19:48:12Z

AetherUnbound
May 9, 2023
Collaborator

I think the current approach works works well for the API, ingestion server, and catalog. As Sara says, those tests are reasonably fast and there's no reason we shouldn't run them. I don't have a strong opinion on the frontend, but I agree that if we don't want playwright to be required for deployment, we should have it run nightly.

4 replies

dhruvkb May 10, 2023
Maintainer Author

@AetherUnbound I understood @sarayourfriend's usage of "nightly" as "beta" or "in development" rather than in terms of running periodically.

AetherUnbound May 10, 2023
Collaborator

Ah, I guess I read that as "we want to run these regularly, but unattached to the success or failure of any PR/deployment due to how flaky they are" 😮

sarayourfriend May 22, 2023
Collaborator

I meant them as Madison understood it. If there are tests that run for too long (either organically or because they should be retried multiple times if they fail at first and can take multiple long retried runs to pass), they should at least be run once per day. Traditionally these sorts of builds and tests are referred to as "nightly" because they would happen automatically "at night" (of course "night" has no meaning for us given that daylight is always shining on at least one maintainer 🙂). They often also run during a production release/build.

We could call them "tests-that-run-too-long-to-block-staging-deployments-but-that-are-important-enough-to-run-at-least-once-per-earth-rotation-and-definitely-before-deploying-production"... or maybe something less cumbersome 😛

sarayourfriend May 22, 2023
Collaborator

FWIW I think we should run the frontend unit tests as often and under the same (relative) circumstances as the API, ingestion server, and catalog tests. There should not be a difference there.

Nightly tests should be reserved for ones that rarely ever need changing, like happy path search queries or something. But frankly, I don't think we have anything that warrants that because our tests are atomic enough that they should run quickly when appropriately parallelised. If there are very flaky tests that are still worth running (instead of skipping, for example) and that cannot be fixed to not be flaky, then those are the only valid candidates for "nightly" tests of out of our current test suite, in my opinion.

The risk with nightly tests is that if you're not aware of them when making changes that knowingly change behaviour tested by a nightly test, when they inevitably fail during the next run, it suddenly displaces the test update from the code that changed the expected behaviour. We should only use this kind of approach if we absolutely must. IMO long running tests are OK as long as they're consistent, especially if they actually verify that the application works as expected.

Moving tests to a "nightly" cadence would also require making sure they run before production releases. If they're long-running and flaky anyway, that's going to make production releases pretty miserable. It's better to just skip and dedicate focused time to fixing those tests or find a different and better way to test the same thing that won't be flaky.

That is to say: I would recommend strongly against moving any of our tests to a non-CI cadence (nightly). It has a whole bunch of drawbacks. Nightly tests are good if it takes hours to set up the test environment or hours to run the tests. We don't have anything like that, thank goodness.

sarayourfriend · 2023-06-01T02:20:17Z

sarayourfriend
Jun 1, 2023
Collaborator

@dhruvkb any news on a resolution for this discussion? It is marked high priority but has been open and remains unresolved for quite some time. Is there additional input you're waiting on from anyone or is more clarification needed to identify the final conclusion from our discussion?

2 replies

dhruvkb Jun 1, 2023
Maintainer Author

I think our discussion was fairly conclusive, especially with our updated process for flaky tests. I will make the Playwright tests a requirement for building frontend images as well.

dhruvkb Jun 1, 2023
Maintainer Author

PR #2275 is up.

Answer selected by dhruvkb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethink criteria for emitting docs, publishing Docker images and deploying staging #1851

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 6 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Rethink criteria for emitting docs, publishing Docker images and deploying staging #1851

dhruvkb Apr 20, 2023 Maintainer

Problem

Description

Discussion

Reference

Replies: 3 comments · 6 replies

sarayourfriend Apr 20, 2023 Collaborator

AetherUnbound May 9, 2023 Collaborator

dhruvkb May 10, 2023 Maintainer Author

AetherUnbound May 10, 2023 Collaborator

sarayourfriend May 22, 2023 Collaborator

sarayourfriend May 22, 2023 Collaborator

sarayourfriend Jun 1, 2023 Collaborator

dhruvkb Jun 1, 2023 Maintainer Author

dhruvkb Jun 1, 2023 Maintainer Author

dhruvkb
Apr 20, 2023
Maintainer

Replies: 3 comments 6 replies

sarayourfriend
Apr 20, 2023
Collaborator

AetherUnbound
May 9, 2023
Collaborator

dhruvkb May 10, 2023
Maintainer Author

AetherUnbound May 10, 2023
Collaborator

sarayourfriend May 22, 2023
Collaborator

sarayourfriend May 22, 2023
Collaborator

sarayourfriend
Jun 1, 2023
Collaborator

dhruvkb Jun 1, 2023
Maintainer Author

dhruvkb Jun 1, 2023
Maintainer Author