Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline & simplify __eq__ methods in models Dag and BaseOperator #13449

Merged
merged 1 commit into from
Jan 4, 2021

Conversation

XD-DENG
Copy link
Member

@XD-DENG XD-DENG commented Jan 3, 2021

  • Change-1: Use getattr() instead of dict as dict doesn't return correct values for properties (sample code reproducing the issue is given below). This was fixed for dag model, but was not fixed for baseoperator model.

  • Change-2: Avoid unnecessary condition check (the removed condition checks are covered by _comps)

Sample Code Reproducing Issue in Change-1

class C:

    def __init__(self, elements):
        self._elements = elements

    @property
    def elements(self):
        return self._elements

    def test(self):
        print(self.__dict__.get("elements", None))
        print(getattr(self, "elements", None))

a = C([1,2,3])
a.test()

Output:

None
[1, 2, 3]

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@XD-DENG XD-DENG added the type:improvement Changelog: Improvements label Jan 3, 2021
@XD-DENG XD-DENG added this to the Airflow 2.0.1 milestone Jan 3, 2021
@XD-DENG XD-DENG requested review from potiuk, kaxil and turbaszek January 3, 2021 11:42
@github-actions
Copy link

github-actions bot commented Jan 3, 2021

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

- Use getattr() instead of __dict__ as __dict__ doesn't return
  correct values for properties.
- Avoid unnecessary condition checks (the removed condition checks are covered by _comps)
@@ -361,8 +361,7 @@ def __repr__(self):
return f"<DAG: {self.dag_id}>"

def __eq__(self, other):
if type(self) == type(other) and self.dag_id == other.dag_id:

if type(self) == type(other):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can remember, it's optimization so we can check the most common case much faster.

Copy link
Member

@mik-laj mik-laj Jan 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That optimisation may not make sense given two reasons below:

  • all() function has the "exit fast" property, i.e., whenever there is any False element, it will return False immediately, rather than traversing all elements in the iterable. reference
  • For dag model, dag_id is the 1st element in _comps; For BaseOperator, task_id is the 1st element in _comps.

So after the change I make here, there should be zero impact on the performance (actually it improves the performance very minorly: it helps avoid comparing dag_id in dag model's __eq__for two times. Similar for BaseOperator).

Kindly let me know if it makes sense to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM.

@mik-laj mik-laj merged commit 6ef23af into apache:master Jan 4, 2021
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jan 4, 2021
@github-actions
Copy link

github-actions bot commented Jan 4, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@XD-DENG XD-DENG deleted the minor-streamline branch January 4, 2021 05:45
kaxil pushed a commit that referenced this pull request Jan 21, 2021
…13449)

- Use getattr() instead of __dict__ as __dict__ doesn't return
  correct values for properties.
- Avoid unnecessary condition checks (the removed condition checks are covered by _comps)

(cherry picked from commit 6ef23af)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge type:improvement Changelog: Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants