Skip to content

Conversation

@ephraimbuddy
Copy link
Contributor

@ephraimbuddy ephraimbuddy commented Aug 13, 2024

This PR separates FAB migration from Airflow Core migration and provides a way for apps to integrate into Airflow and run their migrations.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NICE

@potiuk potiuk requested a review from vincbeck August 14, 2024 00:09
Copy link
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯 Fantastic! I wish I could have done it while working on AIP-56! Thank you!

@ephraimbuddy ephraimbuddy force-pushed the separate-fab-db-from-airflow branch 4 times, most recently from 2ab293d to b54d98c Compare August 15, 2024 09:10
@ephraimbuddy ephraimbuddy marked this pull request as ready for review August 15, 2024 11:01
@ephraimbuddy ephraimbuddy force-pushed the separate-fab-db-from-airflow branch from c9210c6 to 504c078 Compare August 15, 2024 15:04
@ephraimbuddy
Copy link
Contributor Author

Not sure why this test is failing but doesn't fail locally: https://github.com/apache/airflow/actions/runs/10406470435/job/28819881902?pr=41437#step:7:5512

@ephraimbuddy
Copy link
Contributor Author

@potiuk
Copy link
Member

potiuk commented Aug 16, 2024

Likely side-effect of other tests that have been somewhat masked or avoided before the change - where some setup/teardown removed the side-effect.

You can reproduce the set of tests run with breeze testing db-tests --test-type CLI locally (for example - for CLI tests) - that should run the tests in the same sequence as they are run in CI in each of the parallel runs and then they shoudl reproducibly fail as well. Also it could be caused by new version of dependencies - (look at generate constraints output of your build) but it's rather unlikely - https://github.com/apache/airflow/actions/runs/10420774982/ canary build just got green and updated constraints without any test failures, so it's rather unlikely (however you can always rebase and see if it will fail in the same way).

Generally those kind of side-effects are best investigated by a bit guessing and bi-secting - and trying to run a smaller-and-smaller subset of tests until you find the one that is the culprit. At least that's what I did in the past.

You should start by looking at the pytest command that was run in the original test type - unfold the red failing test type and you will see:

  Starting the tests with those pytest arguments: tests/cli --verbosity=0 --strict-markers --durations=100 --maxfail=50 --color=yes --junitxml=/files/test_result-cli-postgres.xml --timeouts-order moi --setup-timeout=60 --execution-timeout=60 --teardown-timeout=60 --disable-warnings -rfEX --run-db-tests-only --ignore=tests/system --ignore=tests/integration --warning-output-path=/files/warnings-cli-postgres.txt --ignore=helm_tests --with-db-init --no-cov
  
  ============================= test session starts ==============================
  platform linux -- Python 3.8.19, pytest-8.3.2, pluggy-1.5.0
  rootdir: /opt/airflow
  configfile: pyproject.toml
  plugins: icdiff-0.9, timeouts-1.2.1, instafail-0.5.0, custom-exit-code-0.3.0, rerunfailures-14.0, asyncio-0.23.8, time-machine-2.15.0, anyio-4.4.0, requests-mock-1.12.1, cov-5.0.0, mock-3.14.0, xdist-3.6.1
  asyncio: mode=strict
  setup timeout: 60.0s, execution timeout: 60.0s, teardown timeout: 60.0s
  collected 406 items
  
  tests/cli/commands/test_celery_command.py ..........                     [  2%]
  tests/cli/commands/test_cheat_sheet_command.py s                         [  2%]
  tests/cli/commands/test_config_command.py ssssssssssssssssss             [  7%]
  tests/cli/commands/test_connection_command.py .......................... [ 13%]
  .....................                                                    [ 18%]
  tests/cli/commands/test_dag_command.py ................................. [ 26%]
  ....................                                                     [ 31%]
  tests/cli/commands/test_dag_processor_command.py .                       [ 32%]
  tests/cli/commands/test_db_command.py .................................. [ 40%]
  .....................................                                    [ 49%]
  tests/cli/commands/test_info_command.py sssssssss..s                     [ 52%]
  tests/cli/commands/test_internal_api_command.py ssss...                  [ 54%]
  tests/cli/commands/test_jobs_command.py ......                           [ 55%]
  tests/cli/commands/test_kerberos_command.py ....                         [ 56%]
  tests/cli/commands/test_kubernetes_command.py ..........                 [ 59%]
  tests/cli/commands/test_legacy_commands.py sss                           [ 59%]
  tests/cli/commands/test_plugins_command.py ...                           [ 60%]
  tests/cli/commands/test_pool_command.py ...........                      [ 63%]
  tests/cli/commands/test_rotate_fernet_key_command.py ..                  [ 63%]
  tests/cli/commands/test_scheduler_command.py ...................         [ 68%]
  tests/cli/commands/test_standalone_command.py ssssssssssssss             [ 71%]
  tests/cli/commands/test_task_command.py ................................ [ 79%]
  .F............                                                           [ 83%]
  tests/cli/commands/test_triggerer_command.py ..                          [ 83%]
  tests/cli/commands/test_variable_command.py ...........                  [ 86%]
  tests/cli/commands/test_version_command.py s                             [ 86%]
  tests/cli/commands/test_webserver_command.py sssssssssss....             [ 90%]
  tests/cli/test_cli_parser.py ..................................s....     [100%]

If your tests succeeds when run separately, but fails when run as tests/cli - then side-effect is almost certain root cause. And you can attempt guess which one is producing the side-effect and run only that test and the one that's failing to confirm your guess. Or attempt to bisect it:

In this case you might convert the single:

  • pytest --run-db-tests-only tests/cli (that should fail locally for you as well)

into (looking at the output):

  • pytest --run-db-tests-only tests/cli/commands/test_celery_command.py tests/cli/commands/test_cheat_sheet_command.py ... tests/cli/commands/test_task_command.py

Then you can remove half of the modules from the list and run it again (then you will see whether side-effect comes from the removed half or the remaining half). And continue that path - even down to a single test that causes the side effect. Then usually fixing it is trivial by adding missing setup/teardown or changing the test so that it patches and restores any state.

It's slow and tedious, yes, but this is the way I've been successfully using in the past to trace root causes of similar issues, and have no other idea how to do it differently faster.

@ephraimbuddy
Copy link
Contributor Author

Likely side-effect of other tests that have been somewhat masked or avoided before the change - where some setup/teardown removed the side-effect.

You can reproduce the set of tests run with breeze testing db-tests --test-type CLI locally (for example - for CLI tests) - that should run the tests in the same sequence as they are run in CI in each of the parallel runs and then they shoudl reproducibly fail as well. Also it could be caused by new version of dependencies - (look at generate constraints output of your build) but it's rather unlikely - https://github.com/apache/airflow/actions/runs/10420774982/ canary build just got green and updated constraints without any test failures, so it's rather unlikely (however you can always rebase and see if it will fail in the same way).

Generally those kind of side-effects are best investigated by a bit guessing and bi-secting - and trying to run a smaller-and-smaller subset of tests until you find the one that is the culprit. At least that's what I did in the past.

You should start by looking at the pytest command that was run in the original test type - unfold the red failing test type and you will see:


  Starting the tests with those pytest arguments: tests/cli --verbosity=0 --strict-markers --durations=100 --maxfail=50 --color=yes --junitxml=/files/test_result-cli-postgres.xml --timeouts-order moi --setup-timeout=60 --execution-timeout=60 --teardown-timeout=60 --disable-warnings -rfEX --run-db-tests-only --ignore=tests/system --ignore=tests/integration --warning-output-path=/files/warnings-cli-postgres.txt --ignore=helm_tests --with-db-init --no-cov

  

  ============================= test session starts ==============================

  platform linux -- Python 3.8.19, pytest-8.3.2, pluggy-1.5.0

  rootdir: /opt/airflow

  configfile: pyproject.toml

  plugins: icdiff-0.9, timeouts-1.2.1, instafail-0.5.0, custom-exit-code-0.3.0, rerunfailures-14.0, asyncio-0.23.8, time-machine-2.15.0, anyio-4.4.0, requests-mock-1.12.1, cov-5.0.0, mock-3.14.0, xdist-3.6.1

  asyncio: mode=strict

  setup timeout: 60.0s, execution timeout: 60.0s, teardown timeout: 60.0s

  collected 406 items

  

  tests/cli/commands/test_celery_command.py ..........                     [  2%]

  tests/cli/commands/test_cheat_sheet_command.py s                         [  2%]

  tests/cli/commands/test_config_command.py ssssssssssssssssss             [  7%]

  tests/cli/commands/test_connection_command.py .......................... [ 13%]

  .....................                                                    [ 18%]

  tests/cli/commands/test_dag_command.py ................................. [ 26%]

  ....................                                                     [ 31%]

  tests/cli/commands/test_dag_processor_command.py .                       [ 32%]

  tests/cli/commands/test_db_command.py .................................. [ 40%]

  .....................................                                    [ 49%]

  tests/cli/commands/test_info_command.py sssssssss..s                     [ 52%]

  tests/cli/commands/test_internal_api_command.py ssss...                  [ 54%]

  tests/cli/commands/test_jobs_command.py ......                           [ 55%]

  tests/cli/commands/test_kerberos_command.py ....                         [ 56%]

  tests/cli/commands/test_kubernetes_command.py ..........                 [ 59%]

  tests/cli/commands/test_legacy_commands.py sss                           [ 59%]

  tests/cli/commands/test_plugins_command.py ...                           [ 60%]

  tests/cli/commands/test_pool_command.py ...........                      [ 63%]

  tests/cli/commands/test_rotate_fernet_key_command.py ..                  [ 63%]

  tests/cli/commands/test_scheduler_command.py ...................         [ 68%]

  tests/cli/commands/test_standalone_command.py ssssssssssssss             [ 71%]

  tests/cli/commands/test_task_command.py ................................ [ 79%]

  .F............                                                           [ 83%]

  tests/cli/commands/test_triggerer_command.py ..                          [ 83%]

  tests/cli/commands/test_variable_command.py ...........                  [ 86%]

  tests/cli/commands/test_version_command.py s                             [ 86%]

  tests/cli/commands/test_webserver_command.py sssssssssss....             [ 90%]

  tests/cli/test_cli_parser.py ..................................s....     [100%]

If your tests succeeds when run separately, but fails when run as tests/cli - then side-effect is almost certain root cause. And you can attempt guess which one is producing the side-effect and run only that test and the one that's failing to confirm your guess. Or attempt to bisect it:

In this case you might convert the single:

  • pytest --run-db-tests-only tests/cli (that should fail locally for you as well)

into (looking at the output):

  • pytest --run-db-tests-only tests/cli/commands/test_celery_command.py tests/cli/commands/test_cheat_sheet_command.py ... tests/cli/commands/test_task_command.py

Then you can remove half of the modules from the list and run it again (then you will see whether side-effect comes from the removed half or the remaining half). And continue that path - even down to a single test that causes the side effect. Then usually fixing it is trivial by adding missing setup/teardown or changing the test so that it patches and restores any state.

It's slow and tedious, yes, but this is the way I've been successfully using in the past to trace root causes of similar issues, and have no other idea how to do it differently faster.

Thanks @potiuk for always helping. I'll look into these and fix them 🙏

@ephraimbuddy ephraimbuddy force-pushed the separate-fab-db-from-airflow branch 4 times, most recently from 1ebb982 to 2c47ab4 Compare August 19, 2024 15:45
@ephraimbuddy ephraimbuddy force-pushed the separate-fab-db-from-airflow branch 2 times, most recently from a5527b0 to 708b15e Compare August 20, 2024 08:21
@eladkal
Copy link
Contributor

eladkal commented Aug 20, 2024

wait.. what does this actually mean for users who is bumping providers only?
We have doc that explain upgrade procedure https://airflow.apache.org/docs/apache-airflow/stable/installation/upgrading.html#upgrading-airflow-to-a-newer-version

If now, migrations can also run from bumping provider only that changes the upgrade procedure users should take.

@ephraimbuddy
Copy link
Contributor Author

wait.. what does this actually mean for users who is bumping providers only? We have doc that explain upgrade procedure https://airflow.apache.org/docs/apache-airflow/stable/installation/upgrading.html#upgrading-airflow-to-a-newer-version

If now, migrations can also run from bumping provider only that changes the upgrade procedure users should take.

I will describe the upgrade procedure in my next PR when I add the upgrade command for FAB provider. It should be smooth

@eladkal
Copy link
Contributor

eladkal commented Aug 20, 2024

I will describe the upgrade procedure in my next PR when I add the upgrade command for FAB provider. It should be smooth

I am worried here.
We may dismiss the impact here too lightly.
Did I miss mailing thread on this topic?

Here we are introducing something really new - bumping provider version which also runs DB migration.. that is not small change to how Airflow operatre.

@ephraimbuddy ephraimbuddy force-pushed the separate-fab-db-from-airflow branch from 24facde to 167ecb6 Compare August 25, 2024 17:53
@ephraimbuddy ephraimbuddy merged commit 59dc981 into apache:main Aug 25, 2024
@ephraimbuddy ephraimbuddy deleted the separate-fab-db-from-airflow branch August 25, 2024 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants