Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-6162] Add back serialization as a module #6718

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Dec 3, 2019

Make sure you have checked all steps below.

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

@potiuk potiuk requested review from ashb and kaxil December 3, 2019 11:22
@@ -101,7 +102,7 @@ def deserialize_dag(cls, encoded_dag: dict) -> "SerializedDAG":
setattr(dag, 'full_filepath', dag.fileloc)
for task in dag.task_dict.values():
task.dag = dag
serializable_task: SerializedBaseOperator = task
serializable_task: SerializedBaseOperator = cast(SerializedBaseOperator, task)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. MyPy complains if it's not there. Maybe it's worth looking whether there is a good reason for that @kaxil ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will shortly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, other than that I am fine with this PR :)

Thanks @potiuk

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth looking at it @kaxil -> there are still some parts of the code that needs some more typing to catch potential errors (DagBag especially) but this one looks a bit fishy to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the root cause is that .subdag field is only set in SerializedBaseOperator and that's why the type is forced here.

Copy link

@nemkin nemkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried this locally and it fixed the missing serialization module after running pip install . Thank you Jarek :)

@kaxil
Copy link
Member

kaxil commented Dec 3, 2019 via email

@potiuk
Copy link
Member Author

potiuk commented Dec 3, 2019

@kaxil -> I have a fix in making for that (but the #6596 should be merge first because we have the errors fixed in this PR).

@potiuk
Copy link
Member Author

potiuk commented Dec 3, 2019

In my upcoming fix I also merge serialized_dag and serialized_baseoperator into one module 'serialized_objects' - because there are cyclic dependencies (again detected by pylint) between those two classes as well and it makes perfect sense to group them in one module.

@potiuk
Copy link
Member Author

potiuk commented Dec 3, 2019

@kaxil @ashb @mik-laj -> Just pushed fixup which removes the cast and fixes it "properly" I think. I moved subdag to BaseOperator and added few missing type annotations. This has let to more cyclic dependencies (between SerializedDag and SerializedBaseOperator). I believe those two classes are so closely coupled that it makes sense to put them in one module (basically that's what pylint is complaining about when telling about "cyclic imports" - that classes in separate modules are so closely coupled that they are referring each other).

BTW. I think the more type annotations we add the more couplings we detect and there will be another set of splits/merges between the core classes. We have still quite a few of not-annotated classes that are hiding such couplings.

@potiuk
Copy link
Member Author

potiuk commented Dec 3, 2019

I will rebase it on top of #6596 to see if it works with hose fixes.

@codecov-io
Copy link

codecov-io commented Dec 3, 2019

Codecov Report

Merging #6718 into master will decrease coverage by 0.33%.
The diff coverage is 96.09%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6718      +/-   ##
==========================================
- Coverage   83.84%   83.51%   -0.34%     
==========================================
  Files         669      668       -1     
  Lines       37692    37677      -15     
==========================================
- Hits        31603    31465     -138     
- Misses       6089     6212     +123
Impacted Files Coverage Δ
airflow/models/dagbag.py 86.53% <0%> (ø) ⬆️
airflow/serialization/__init__.py 100% <100%> (ø)
airflow/models/baseoperator.py 96.3% <100%> (+0.01%) ⬆️
airflow/models/dag.py 90.95% <100%> (ø) ⬆️
airflow/models/serialized_dag.py 87.65% <100%> (ø) ⬆️
airflow/serialization/serialized_objects.py 90.94% <96.72%> (ø)
airflow/kubernetes/volume_mount.py 44.44% <0%> (-55.56%) ⬇️
airflow/kubernetes/volume.py 52.94% <0%> (-47.06%) ⬇️
airflow/kubernetes/pod_launcher.py 45.25% <0%> (-46.72%) ⬇️
airflow/kubernetes/refresh_config.py 50.98% <0%> (-23.53%) ⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a36cfe0...256e216. Read the comment docs.

@potiuk potiuk force-pushed the AIRFLOW-6162-add-back-serialization-module branch from fdb06ca to 42db902 Compare December 3, 2019 13:35
@potiuk
Copy link
Member Author

potiuk commented Dec 3, 2019

Rebased now on top of #6596

@potiuk potiuk force-pushed the AIRFLOW-6162-add-back-serialization-module branch 3 times, most recently from c8ff842 to eeeb99e Compare December 3, 2019 14:58
@potiuk
Copy link
Member Author

potiuk commented Dec 3, 2019

And some more pylint complaints (rightfully so) I also moved BaseSerialization in the same "serialization_objects.py". Now this module is really what it should be :).

@potiuk
Copy link
Member Author

potiuk commented Dec 3, 2019

@ashb @kaxil -> pls take a look. It's changed since the approval a lot.

@potiuk potiuk force-pushed the AIRFLOW-6162-add-back-serialization-module branch from eeeb99e to 256e216 Compare December 3, 2019 15:03
@potiuk potiuk merged commit 0abefe9 into apache:master Dec 3, 2019
kaxil added a commit that referenced this pull request Dec 18, 2019
ashb pushed a commit that referenced this pull request Dec 18, 2019
ashb pushed a commit that referenced this pull request Dec 19, 2019
kaxil added a commit that referenced this pull request Dec 19, 2019
galuszkak pushed a commit to FlyrInc/apache-airflow that referenced this pull request Mar 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants