Improve testing harness to separate DB and non-db test (#35160)

This PR marks DB tests as such and allows to split execution of the tests in CI to run the DB tests with the various database while the non-db tests - without the DB in a separate run. In order to do that, the code to select which tests to run has been moved from `entrypoint_ci.sh` bash to breeze's Python code, which is generally much nicer to maintain and common for both "DB" and "non-DB" tests. This will have the nice side effect that it will be easier in the future to manage different test types and contain some specific flaky test types. This change also adds possibility to isolate some of the test types when parallel DB tests are run and adds new test type PythonOperator carved out Operator type. This test is best run in isolation becasue creating and destroing virtualenvs in Docker while running in parallel to other tests is very slow for some reason and leads to flaky tests. Python operator tests are therefore separated out from Operators and treated separately as isolated tests. This will help not only with speed but also with stability of the test suite.
apache · Oct 31, 2023 · a7e76ba · a7e76ba
1 parent 651b326
commit a7e76ba
Show file tree

Hide file tree

Showing 79 changed files with 4,147 additions and 1,498 deletions.
diff --git a/.github/actions/post_tests_success/action.yml b/.github/actions/post_tests_success/action.yml
@@ -28,14 +28,14 @@ runs:
         path: ./files/warnings-*.txt
         retention-days: 7
     - name: "Move coverage artifacts in separate directory"
-      if: env.COVERAGE == 'true' && env.TEST_TYPES != 'Helm'
+      if: env.ENABLE_COVERAGE == 'true' && env.TEST_TYPES != 'Helm'
       shell: bash
       run: |
         mkdir ./files/coverage-reposts
         mv ./files/coverage*.xml ./files/coverage-reposts/ || true
     - name: "Upload all coverage reports to codecov"
       uses: codecov/codecov-action@v3
-      if: env.COVERAGE == 'true' && env.TEST_TYPES != 'Helm'
+      if: env.ENABLE_COVERAGE == 'true' && env.TEST_TYPES != 'Helm'
       with:
         name: coverage-${{env.JOB_ID}}
         flags: python-${{env.PYTHON_MAJOR_MINOR_VERSION}},${{env.BACKEND}}-${{env.BACKEND_VERSION}}

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
diff --git a/BREEZE.rst b/BREEZE.rst
@@ -970,12 +970,14 @@ Here is the detailed set of options for the ``breeze testing`` command.
 Iterate on tests interactively via ``shell`` command
 ....................................................
 
-You can simply enter the ``breeze`` container and run ``pytest`` command there. You can enter the
-container via just ``breeze`` command or ``breeze shell`` command (the latter has more options
-useful when you run integration or system tests). This is the best way if you want to interactively
-run selected tests and iterate with the tests. Once you enter ``breeze`` environment it is ready
-out-of-the-box to run your tests by running the right ``pytest`` command (autocomplete should help
-you with autocompleting test name if you start typing ``pytest tests<TAB>``).
+You can simply enter the ``breeze`` container in interactive shell (via ``breeze`` or more comprehensive
+``breeze shell`` command) or use your local virtualenv and run ``pytest`` command there.
+This is the best way if you want to interactively run selected tests and iterate with the tests.
+
+The good thing about ``breeze`` interactive shell is that it has all the dependencies to run all the tests
+and it has the running and configured backed database started for you when you decide to run DB tests.
+It also has auto-complete enabled for ``pytest`` command so that you can easily run the tests you want.
+(autocomplete should help you with autocompleting test name if you start typing ``pytest tests<TAB>``).
 
 Here are few examples:
 
@@ -991,25 +993,30 @@ To run the whole test class:
 
     pytest tests/core/test_core.py::TestCore
 
-You can re-run the tests interactively, add extra parameters to pytest and modify the files before
+You can re-run the tests interactively, add extra parameters to pytest  and modify the files before
 re-running the test to iterate over the tests. You can also add more flags when starting the
 ``breeze shell`` command when you run integration tests or system tests. Read more details about it
 in the `testing doc <TESTING.rst>`_ where all the test types and information on how to run them are explained.
 
 This applies to all kind of tests - all our tests can be run using pytest.
 
-Running unit tests
-..................
+Running unit tests with ``breeze testing`` commands
+...................................................
+
+An option you have is that you can also run tests via built-in ``breeze testing tests`` command - which
+is a "swiss-army-knife" of unit testing with Breeze. This command has a lot of parameters and is very
+flexible thus might be a bit overwhelming.
 
-Another option you have is that you can also run tests via built-in ``breeze testing tests`` command.
-The iterative ``pytest`` command allows to run test individually, or by class or in any other way
-pytest allows to test them and run them interactively, but ``breeze testing tests`` command allows to
-run the tests in the same test "types" that are used to run the tests in CI: for example Core, Always
-API, Providers. This how our CI runs them - running each group in parallel to other groups and you can
-replicate this behaviour.
+In most cases if you want to run tess you want to use dedicated ``breeze testing db-tests``
+or ``breeze testing non-db-tests`` commands that automatically run groups of tests that allow you to choose
+subset of tests to run (with ``--parallel-test-types`` flag)
 
-Another interesting use of the ``breeze testing tests`` command is that you can easily specify sub-set of the
-tests for Providers.
+
+Using ``breeze testing tests`` command
+......................................
+
+The ``breeze testing tests`` command is that you can easily specify sub-set of the tests -- including
+selecting specific Providers tests to run.
 
 For example this will only run provider tests for airbyte and http providers:
 
@@ -1025,7 +1032,6 @@ For example this will run tests for all providers except amazon and google provi
 
    breeze testing tests --test-type "Providers[-amazon,google]"
 
-
 You can also run parallel tests with ``--run-in-parallel`` flag - by default it will run all tests types
 in parallel, but you can specify the test type that you want to run with space separated list of test
 types passed to ``--parallel-test-types`` flag.
@@ -1039,12 +1045,9 @@ For example this will run API and WWW tests in parallel:
 There are few special types of tests that you can run:
 
 * ``All`` - all tests are run in single pytest run.
-* ``PlainAsserts`` - some tests of ours fail when ``--assert=rewrite`` feature of pytest is used. This
-  is in order to get better output of ``assert`` statements This is a special test type that runs those
-  select tests tests with ``--assert=plain`` flag.
-* ``Postgres`` - runs all tests that require Postgres database
-* ``MySQL`` - runs all tests that require MySQL database
-* ``Quarantine`` - runs all tests that are in quarantine (marked with ``@pytest.mark.quarantined``
+* ``All-Postgres`` - runs all tests that require Postgres database
+* ``All-MySQL`` - runs all tests that require MySQL database
+* ``All-Quarantine`` - runs all tests that are in quarantine (marked with ``@pytest.mark.quarantined``
   decorator)
 
 Here is the detailed set of options for the ``breeze testing tests`` command.
@@ -1054,6 +1057,86 @@ Here is the detailed set of options for the ``breeze testing tests`` command.
   :width: 100%
   :alt: Breeze testing tests
 
+Using ``breeze testing db-tests`` command
+.........................................
+
+The ``breeze testing db-tests`` command is simplified version of the ``breeze testing tests`` command
+that only allows you to run tests that are not bound to a database - in parallel utilising all your CPUS.
+The DB-bound tests are the ones that require a database to be started and configured separately for
+each test type run and they are run in parallel containers/parallel docker compose projects to
+utilise multiple CPUs your machine has - thus allowing you to quickly run few groups of tests in parallel.
+This command is used in CI to run DB tests.
+
+By default this command will run complete set of test types we have, thus allowing you to see result
+of all DB tests we have but you can choose a subset of test types to run by ``--parallel-test-types``
+flag or exclude some test types by specifying ``--excluded-parallel-test-types`` flag.
+
+Run all DB tests:
+
+.. code-block:: bash
+
+   breeze testing db-tests
+
+Only run DB tests from "API CLI WWW" test types:
+
+.. code-block:: bash
+
+   breeze testing db-tests --parallel-test-types "API CLI WWW"
+
+Run all DB tests excluding those in CLI and WWW test types:
+
+.. code-block:: bash
+
+   breeze testing db-tests --excluded-parallel-test-types "CLI WWW"
+
+Here is the detailed set of options for the ``breeze testing db-tests`` command.
+
+.. image:: ./images/breeze/output_testing_db-tests.svg
+  :target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_testing_db-tests.svg
+  :width: 100%
+  :alt: Breeze testing db-tests
+
+
+Using ``breeze testing non-db-tests`` command
+.........................................
+
+The ``breeze testing non-db-tests`` command is simplified version of the ``breeze testing tests`` command
+that only allows you to run tests that are not bound to a database - in parallel utilising all your CPUS.
+The non-DB-bound tests are the ones that do not expect a database to be started and configured and we can
+utilise multiple CPUs your machine has via ``pytest-xdist`` plugin - thus allowing you to quickly
+run few groups of tests in parallel using single container rather than many of them as it is the case for
+DB-bound tests. This command is used in CI to run Non-DB tests.
+
+By default this command will run complete set of test types we have, thus allowing you to see result
+of all DB tests we have but you can choose a subset of test types to run by ``--parallel-test-types``
+flag or exclude some test types by specifying ``--excluded-parallel-test-types`` flag.
+
+Run all non-DB tests:
+
+.. code-block:: bash
+
+   breeze testing non-db-tests
+
+Only run non-DB tests from "API CLI WWW" test types:
+
+.. code-block:: bash
+
+   breeze testing non-db-tests --parallel-test-types "API CLI WWW"
+
+Run all non-DB tests excluding those in CLI and WWW test types:
+
+.. code-block:: bash
+
+   breeze testing non-db-tests --excluded-parallel-test-types "CLI WWW"
+
+Here is the detailed set of options for the ``breeze testing non-db-tests`` command.
+
+.. image:: ./images/breeze/output_testing_non-db-tests.svg
+  :target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_testing_non-db-tests.svg
+  :width: 100%
+  :alt: Breeze testing non-db-tests
+
+
 Running integration tests
 .........................
 
@@ -1076,11 +1159,14 @@ Here is the detailed set of options for the ``breeze testing integration-tests``
   :alt: Breeze testing integration-tests
 
 
-Running Helm tests
-..................
+Running Helm unit tests
+.......................
 
-You can use Breeze to run all Helm tests. Those tests are run inside the breeze image as there are all
-necessary tools installed there.
+You can use Breeze to run all Helm unit tests. Those tests are run inside the breeze image as there are all
+necessary tools installed there. Those tests are merely checking if the Helm chart of ours renders properly
+as expected when given a set of configuration parameters. The tests can be run in parallel if you have
+multiple CPUs by specifying ``--run-in-parallel`` flag - in which case they will run separate containers
+(one per helm-test package) and they will run in parallel.
 
 .. image:: ./images/breeze/output_testing_helm-tests.svg
   :target: https://raw.githubusercontent.com/apache/airflow/main/images/breeze/output_testing_helm-tests.svg

diff --git a/CI.rst b/CI.rst
@@ -394,7 +394,9 @@ This workflow is a regular workflow that performs all checks of Airflow code.
 +---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+
 | Tests airflow release commands  | Tests if airflow release command works                   | -        | Yes      | Yes       | -                 |
 +---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+
-| Tests (Backend/Python matrix)   | Run the Pytest unit tests (Backend/Python matrix)        | Yes      | Yes      | Yes       | Yes (8)           |
+| Tests (Backend/Python matrix)   | Run the Pytest unit DB tests (Backend/Python matrix)     | Yes      | Yes      | Yes       | Yes (8)           |
++---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+
+| No DB tests                     | Run the Pytest unit Non-DB tests (with pytest-xdist)     | Yes      | Yes      | Yes       | Yes (8)           |
 +---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+
 | Integration tests               | Runs integration tests (Postgres/Mysql)                  | Yes      | Yes      | Yes       | Yes (9)           |
 +---------------------------------+----------------------------------------------------------+----------+----------+-----------+-------------------+

diff --git a/CI_DIAGRAMS.md b/CI_DIAGRAMS.md
@@ -189,7 +189,7 @@ sequenceDiagram
             par
                 opt
                     GitHub Registry ->> Tests: Pull CI Images<br>[COMMIT_SHA]
-                    Note over Tests: Unit Tests<br>Python/DB matrix
+                    Note over Tests: Unit Tests<br>Python/DB matrix/No DB
                 end
             and
                 opt