Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-84 | Public list tags API #42959

Merged
merged 4 commits into from
Oct 21, 2024

Conversation

jason810496
Copy link
Contributor

@boring-cyborg boring-cyborg bot added the area:UI Related to UI/UX. For Frontend Developers. label Oct 12, 2024
@jason810496
Copy link
Contributor Author

Hi, this PR should be marked with the area:API label instead of area:UI. The bot automatically marked it with area:UI because the pre-commit process auto-generates the frontend code using OpenAPI. Thanks!

@jscheffl jscheffl added area:API Airflow's REST/HTTP API and removed area:UI Related to UI/UX. For Frontend Developers. labels Oct 12, 2024
@jscheffl
Copy link
Contributor

Thanks for the contribution! w/o having a full review I assume you should mark the legacy API as being migrated like e.g. in https://github.com/apache/airflow/pull/42798/files#diff-df829fc289eeca0e932974a5110a8f2c21b1962e87db7c3a3b44dacf084f54a1R47 ?

@jscheffl jscheffl added AIP-84 Modern Rest API legacy api Whether legacy API changes should be allowed in PR labels Oct 12, 2024
@Lee-W Lee-W self-requested a review October 12, 2024 09:24
@jason810496
Copy link
Contributor Author

Hi @jscheffl, since this API is migrated from the home view in airflow/www/views.py, and dag_tags is just a small part of the home view, I'm not sure if this is the only endpoint that needs to be migrated from the home view. @bbovenzi, do you have any insights on this?

@jscheffl
Copy link
Contributor

Hi @jscheffl, since this API is migrated from the home view in airflow/www/views.py, and dag_tags is just a small part of the home view, I'm not sure if this is the only endpoint that needs to be migrated from the home view. @bbovenzi, do you have any insights on this?

Okay, then if this is not a migration we should clarify if this really should get to be a public API. I'd think if it is used only in one place in the new UI then we should not make it a new public API. As there are other public APIs to retrieve the sme information (https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/get_dag_details)

@jason810496
Copy link
Contributor Author

Sure, that makes sense. I’ll go ahead and move this endpoint to the UI folder.

@Lee-W Lee-W self-requested a review October 14, 2024 10:50
@Lee-W Lee-W marked this pull request as draft October 14, 2024 10:50
Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, a few suggestions before we can merge this.

airflow/api_fastapi/views/public/dags.py Outdated Show resolved Hide resolved
airflow/api_fastapi/views/public/dags.py Outdated Show resolved Hide resolved
airflow/api_fastapi/views/public/dags.py Outdated Show resolved Hide resolved
airflow/api_fastapi/views/public/dags.py Outdated Show resolved Hide resolved
airflow/api_fastapi/views/public/dags.py Outdated Show resolved Hide resolved
@pierrejeambrun
Copy link
Member

Okay, then if this is not a migration we should clarify if this really should get to be a public API. I'd think if it is used only in one place in the new UI then we should not make it a new public API. As there are other public APIs to retrieve the sme information (https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/get_dag_details)

I think this is a good addition to the public API because currently there is no way for a user to be able to retrieve all the tags available for filtering for instance. This was done in the legacy UI through custom views.

Meaning that users of the API would have to know in advance tags to effectively filter on them.

@jason810496
Copy link
Contributor Author

jason810496 commented Oct 14, 2024

Based on all the above input, I believe the original intention of issue #42713 is to provide an endpoint to retrieve all tags across all DAGs for advanced filtering functionalities. I agree with the last comment by @pierrejeambrun, so this endpoint should remain in the public folder. However, I think the response schema should be:

{
  "tags": ["tag_1", "tag_2"],
  "total_entries": 2
}

Additionally, the endpoint should support offset and limit query parameters, as mentioned by @rawwar in the issue. This way, the get_dag_tags endpoint can use the same query parameters and a similar response schema as other public endpoints , as well as paginated_select on the backend.

Please correct me if I have misunderstood anything. Thanks !

@pierrejeambrun
Copy link
Member

pierrejeambrun commented Oct 15, 2024

I think the response could be:

{
  "tags": [{"tag_name: "tag_1", "dag_id": "some_dag_id"}, {"tag_name": "tag_2", "dag_id": "some_dag_id"}],
  "total_entries": 2
}

Basically we return serialized DagTag in the response field.

@bbovenzi
Copy link
Contributor

I think the response could be:

{
  "tags": [{"tag_name: "tag_1", "dag_id": "some_dag_id"}, {"tag_name": "tag_2", "dag_id": "some_dag_id"}],
  "total_entries": 2
}

Basically we return serialized DagTag in the response field.

Is dag_id actually valuable information though? The whole point is that multiple dags share tags, so only showing a single dag_id doesn't help anyone. I actually prefer the list of tag_name strings.

@pierrejeambrun
Copy link
Member

I’ll check the db modelling tomorrow, something I might have missesed. I suggested that dag_id was there because that’s how the sqlalchemy model is defined.

But indeed from a business point of view this does not make sense.

But on the other hand the name is the primary key and unique, so I guess there is something I need to double check.

@jason810496
Copy link
Contributor Author

jason810496 commented Oct 16, 2024

Update:

The endpoint is placed in the public folder and supports the following query parameters:

  • tag_name_pattern: _SearchParam
  • order_by: _OrderByParam
  • limit: QueryLimit
  • offset: QueryOffset

Example of the response schema:

{
    "tags": ["tag_1", "tag_2"],
    "total_entries": 3
}

For the SQL statement, the name field should still include DISTINCT since there may be rows like:

same_tag_name, dag_id_1
same_tag_name, dag_id_2

While the name field with the dag_id field creates a unique pair, the name field itself may have duplicates in the query results.

Feature: _OrderByParam - New query parameter for ordering by a field.

  • order_by: asc, desc
  • Ordering by a single field (This could be useful for models with only a few fields, where typically sorting by a single field is sufficient.)

@jason810496 jason810496 marked this pull request as ready for review October 16, 2024 12:19
@kaxil
Copy link
Member

kaxil commented Oct 16, 2024

You will have to rebase and fix conflicts as we change how fastapi_api folder is organized here: #43062

airflow/api_fastapi/parameters.py Outdated Show resolved Hide resolved
airflow/api_fastapi/routes/public/dags.py Outdated Show resolved Hide resolved
@Lee-W Lee-W removed their request for review October 18, 2024 07:03
pierrejeambrun
pierrejeambrun previously approved these changes Oct 18, 2024
Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good

@pierrejeambrun pierrejeambrun dismissed their stale review October 18, 2024 12:48

We need to remove the extra query parameter to use the SortParam and we should be good.

@jason810496
Copy link
Contributor Author

Update:

Hi @pierrejeambrun, the group_by resolved the issue. Thanks!

Refactor: SortParm

  • Moved attr_mapping to the __init__ parameter (since SortParm is used by different endpoints).
  • Consolidated allow_attr with attr_mapping (we can use attr_mapping to validate the query parameters).
  • Added default_attr (for models with multiple primary keys, like DagTag, the original get_primary_key_column might not be the column we want to sort by. With default_attr, we can specify the default field for sorting).

@pierrejeambrun
Copy link
Member

Thanks @jason810496, Do you mind putting the refactoring part into its own PR so it does not block this one.

@jason810496
Copy link
Contributor Author

Thanks @jason810496, Do you mind putting the refactoring part into its own PR so it does not block this one.

Sure, so I should create a new branch on top of the current commit and rebase the current branch.
After the current PR is merged, I will create a PR for refactoring SortParm from the new branch.
Is that correct?

@pierrejeambrun
Copy link
Member

Yes, you can checkout another branch from here (full with refactoring). Then in this branch you can drop them (the refactoring part)

@jason810496
Copy link
Contributor Author

Resolved! I have removed the refactoring part.

@pierrejeambrun pierrejeambrun merged commit f3bd2c2 into apache:main Oct 21, 2024
51 of 52 checks passed
@pierrejeambrun
Copy link
Member

pierrejeambrun commented Oct 21, 2024

Unrelated CI failure, merging.

Thanks @jason810496

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-84 Modern Rest API area:API Airflow's REST/HTTP API legacy api Whether legacy API changes should be allowed in PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AIP-84 | Public list tags endpoint
8 participants