Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New design of system tests #22311

Merged
merged 31 commits into from
Mar 25, 2022
Merged

New design of system tests #22311

merged 31 commits into from
Mar 25, 2022

Conversation

mnojek
Copy link
Contributor

@mnojek mnojek commented Mar 16, 2022

Hello everyone,

I am really excited to show you the collective work of the last few weeks!
This is the initial PR that prepares Airflow to be compliant with
AIP-47 which is related to the new design of system tests in Airflow. This is the first step to automated system tests in our CI. When it is merged, it will enable other providers to start migration of their system tests to the new design.

Basically, the main advantage of this design is that we will have self-contained tests encapsulated in DAGs that can be run with pytest as well as just imported into Airflow and be executed there. Requirements for the environment are kept at minimum and are mostly related to specific test needs. By using built-in features of Airflow, each system test is a DAG that is also an example of usage for tested Operator. All in one, in a simple file :)

It is a minimal PR that contains a bunch of tests for only 1 Google service (BigQuery) and everything else that is required by the tests to be working properly. It also consists of all necessary documentation, new pre-commit hook (check-watcher-in-examples), updated checks (in tests/always) and some other things. More tests will be migrated in a separate PRs, because we wanted this one to be as small as possible.

Please familiarize with the changes and also read the AIP-47 to fully understand why they are needed.
We really want to hear your feedback. In case of any questions, concerns or comments - we are here to answer them and explain things. The whole work was overseen by @potiuk and done together with @bhirsz and @kosteev and me (@mnojek).

Bartlomiej Hirsz and others added 14 commits March 10, 2022 12:24
Migrate BigQuery system tests to new design (See AIP-47 for details)

Change-Id: Ia0579e22b495a81c5b4bc9fe37dd2067406c934d
Change-Id: Id9594771fe12e22f052f1929c7afc7fe43aec166
Change-Id: I7a9c93d0ecc7a12edca933ea693c9219e9e3a5b5
Change-Id: Iee9cecaf2f1991c48ead28e91e384c55af6271d1
Change-Id: I1479b3cad103a710c7a6ff5d218f68fed98fe20a
Change-Id: I67005b1bc5f5d0170a1436fba8b324d419912303
Change-Id: I935c4de2ca5d36c2b9b8ddbab80af0b50e4226d6
Change-Id: Ice6fa8f33530d842f78df95aba1e3d6d72ec9697
Change-Id: Ic7c5ffce6e67a1d5f8b32817be24e51864ab6abf
>> delete_bucket
)

list(dag.tasks) >> watcher()
Copy link
Contributor

@ferruzzi ferruzzi Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was discussed in the email thread, but I'll raise the concern one last time because this example really drives it home, then I promise I'll drop it. We are adding the exact same line of code to every Example DAG (now System Test), shouldn't that really be handled behind the scenes when the test DAG is being parsed instead of adding this copy/pasted line to literally hundreds of files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a better idea how this can be achieved without explicitly defining this task dependency. This is actually quite clever because in the previous version of this design we were listing all the tasks again and then defining >> operator with the watcher. Now it's just this one line that has a reference to all tasks in a local dag and then appends the watcher to all of them. And it is of course only needed when there is a teardown task in the dag.
But of course I am open to any specific suggestions that may improve the design.

Copy link
Member

@potiuk potiuk Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We thought about a special "util" function for that.

I do not think we discussed about "magical" adding it. this is a bit of a hassle, I agreee but adding it would likely require something on the DAG level - for example a special parameter or decorator. And the problem with that is that it would have to be added somewhere at the beginning of the DAG - where you define the DAG most likely. And the problem with that is that you also use the DAG as examples in our documentation. And we extracts parts of the examples into the documentaiton and we should not pollute those examples with things that are not really good "examples" on how you should add your DAGs. - and those example usually show the DAG definition/default_args as part of the example - by having the special decorator, or parameter on the DAG to indicate that DAG shoudl have "watcher" added migth be too easily copied from those extracted examples.

Having explicit watcher makes it so much easier - it is at the and and it is explicit (which means there is no magic)

But actually what made me think now - we should actually make it even more separated an explicit.

@mnojek why don't we change the the watcher's local import and some comment there to make it even more separated and "explicit" for example:

         >> delete_bucket
    ) 
    
    from tests.system.utils.watcher import watcher
    # This test run as a system test needs watcher in order to mark success/failure
    # where "tearDown" task is part of the DAG
    list(dag.tasks) >> watcher()
    

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am open to any specific suggestions

I get that. Unfortunately I only have half-baked suggestions. the only thing I can really think of would be if we had something like tests.system.DAG which inherits airflow.DAG and adds that watcher to the task list, but that's only a half-baked idea and I haven't really thought it through all the way. Writing the same line in hundreds of files just feels wrong, but without a more solid alternative, I'll stop beating that dead horse. Thanks for the effort you've put into this, and I hope I'm not coming off as argumentative or anything.

Copy link
Contributor

@bhirsz bhirsz Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing the same line in hundreds of files just feels wrong

I fully agree with you and it was actually the reason for the redesign - where we had dozens of lines repeated for every test only to trigger other file with the DAG. And we always welcome any suggestions and discussion. We changed the design several times over the time thanks for the input from the others.
I agree with Jarek here - the goal is to have an example dag that can be used by other Airflow users as reference and it would be best to avoid any code specific only to system test. Proposed local import of the watcher looks like a good idea for me for this reason alone

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support the idea about moving the import for the watcher to local scope. It will be more obvious why it is needed, and we can also add the comment that @potiuk suggested (or similar).
And yeah, copy-pasting the same piece of code in many places is not the best, but at least here this "piece" is small and it's better to have it explicitly than doing some magic stuff that will only shadow the real implementation.
I will do the changes next week.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code and moved the watcher import closer its usage.
Also added the comment to demystify this piece of code.
Please check and confirm if it's fine now 😃

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also configure a cluster policy that will perform the necessary steps only in tests.
Here are docs about cluster policy:
https://airflow.apache.org/docs/apache-airflow/stable/concepts/cluster-policies.html?highlight=cluster%20policy#dag-policies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster policy looks nice but it's overkill in our case when after discussion in this PR we managed to store the watcher together in the test method. It's worth to remember though in case we will need any extra steps.

tests/system/providers/google/README.md Outdated Show resolved Hide resolved
WATCHER_APPEND_INSTRUCTION = "list(dag.tasks) >> watcher()"

PYTEST_FUNCTION = """
def test_run():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that below this is being copy/pasted into each example dag file. What if this function needs to be modified or fixed in the future? Would that involve patching every single example dag file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately yes. Although in this case it isn't complicated to replace few lines - one file or 100 it doesn't matter for your awk/sed script ;). It's bit of the trade off - we could also leave example dags without any pytest method and generate tests dynamically but by adding this method it's easier to run just one system tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one file or 100 it doesn't matter for your awk/sed script

Hmm, this is not the kind of precedent I want to be setting 😄 It's still quite finicky and error prone.

You can clean this up quite a bit using python itself. The test_run method can be defined in one location in a separate module and then just imported into each example dag file.

I've mocked up an example, see the attached image below:

Screenshot from 2022-03-17 11-54-21

Copy link
Member

@potiuk potiuk Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice idea. If pytest still discovers it this way, I lke it a lot better. also you can even connect it with local import to make it even more localized (same reason as in the other comment , where we care about "example_dag" being still an example dag despite also being a self-contained test. Also it nicely solves the problem which I thought about - what happens if we have two dags in an example. They would have to have slightiy different method ( dag1.clear() dag1.run() vs dag2.clear(() dag2.run()). Passing dag as parameter solves it nicely.

I'd be for something like this:

# Needed to run the example dag as system test <link to the docs>
from tests.airflow.proivders.util import get_test_run
test_run = get_test_run(dag)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea - it's also making it easier for temporary updates of the test method (ie adding verbose mode, overwriting executor etc). I've tested it and it appears to be working. Pytest recognizes it if I run it using pytest command however it's not detected by IDE as test method (but maybe I can workaround it somehow with pytest ini). I will discuss it with @mnojek and if we both agree we can update it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it nicely solves the problem which I thought about - what happens if we have two dags in an example.

Yupp, good catch! I meant to add this to my original post but totally forgot. It makes it much more configurable this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code with the test method proposed by @o-nikolas (just added # noqa: E402 since we're importing not on the top of the file)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the check with a regex pattern, so that it accepts whether there is a comment in between lines in "pytest function" or not.

Copy link
Contributor

@kosteev kosteev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this effort!

tests/always/test_example_dags.py Outdated Show resolved Hide resolved
tests/always/test_example_dags.py Outdated Show resolved Hide resolved
tests/system/conftest.py Outdated Show resolved Hide resolved
Bartlomiej Hirsz added 2 commits March 18, 2022 12:18
Change-Id: Ib15e66bd4fd49fd3db586c8b01753e4a19299bf5
Change-Id: Id305440af10f2b2e01fb5d2713c5e5f7db6028f7
@potiuk potiuk closed this Mar 24, 2022
@potiuk potiuk reopened this Mar 24, 2022
@potiuk
Copy link
Member

potiuk commented Mar 24, 2022

Fixed Python 3.6 (needed for 2.2.5 release) in main. I hope it will succeed this time :)

@potiuk
Copy link
Member

potiuk commented Mar 24, 2022

Echch

@potiuk potiuk closed this Mar 24, 2022
@potiuk potiuk reopened this Mar 24, 2022
@potiuk
Copy link
Member

potiuk commented Mar 24, 2022

Rebuilding (aftermath of preparing 2.2.5 release)

@potiuk potiuk closed this Mar 24, 2022
@potiuk potiuk reopened this Mar 24, 2022
Bartlomiej Hirsz and others added 2 commits March 25, 2022 10:24
@potiuk potiuk merged commit 3c7cd47 into apache:main Mar 25, 2022
@ferruzzi
Copy link
Contributor

Nice. Congrats on the merge.

@potiuk potiuk added the AIP-47 AIP-47 New Design of System Tests label Mar 31, 2022
if any task fails, we need to use the watcher pattern. The watcher task is a task that will always fail if
triggered, but it needs to be triggered only if any other task fails. It needs to have a trigger rule set to
``TriggerRule.ONE_FAILED`` and it needs also to be a downstream task for all other tasks in the DAG. Thanks to
this, if every other task will pass, the watcher will be skipped, but when something fails, the watcher task will be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If every task will pass, then watcher will be skipped. Does it mean the teardown task(e.g. to clean up the resources) will be skipped? @mnojek

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the teardown task will be executed because by design it has a trigger rule set to "all done" which means it is executed always when all parent tasks are executed. The watcher is a child for every task so it doesn't count into the tasks that need to be executed to trigger the teardown. It is skipped only if everything passed.
And the watcher is triggered only if any of its parent tasks failed, and because all tasks are parent for the watcher, if any of them fails, the watcher is triggered.
Hope it answers your question. If you still have concerns, please comment and I will try to help.

@ephraimbuddy ephraimbuddy added the type:misc/internal Changelog: Misc changes that should appear in change log label Apr 10, 2022
@ephraimbuddy ephraimbuddy added changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) and removed type:misc/internal Changelog: Misc changes that should appear in change log labels Apr 26, 2022
@potiuk
Copy link
Member

potiuk commented Jun 1, 2022

Summary of ALL performed actions: for 41 issues
Re-added file number: 0
Completed file number: 4
Done 43/272 = 15.81%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-47 AIP-47 New Design of System Tests area:dev-tools area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) full tests needed We need to run full set of tests for this PR to merge kind:documentation provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants