Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Feb 6, 2023

The open-api-schema-validator 0.4.3 made RFC3339 validation of the date-time fields mandatory and this revealed problems in our test URLs - the '+' was not url-encoded and it was replaced with space - thus the dates passed were not valid RFC3339 date-time specifications.

This however revealed one more problem. The rfc3339-validator package is automatically installed by the open-api-schema-validator, but when installed, it also adds validation to date-time fields validated by the Params of ours - for example naive date-time parameters are not supported any more (but they were in the past).

This might introduce breaking changes for users who use non-valid date-time parameters - however we should consider that as a bugfix, and accidental support, because the date-time schema should expect RFC3339 formatted date time.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added the area:API Airflow's REST/HTTP API label Feb 6, 2023
@potiuk
Copy link
Member Author

potiuk commented Feb 6, 2023

I think this is the only way to proceed even though it introduces potential backwards-compatibility for those for whom the validation accidentally worked before:

Example snipppet of code showing how date-time fields will be validated after installing the validator:

>>> from airflow.utils.timezone import datetime
>>> from rfc3339_validator import validate_rfc3339
>>> dtt =  datetime(2020, 1, 1)
>>> print(dtt)
2020-01-01 00:00:00+00:00
>>> validate_rfc3339(str(dtt))
False
>>> validate_rfc3339('2020-01-01T00:00:00+00:00')
True
>>> validate_rfc3339('2020-01-01T00:00:00')
False

@potiuk
Copy link
Member Author

potiuk commented Feb 6, 2023

cc: @msumit @mik-laj @ephraimbuddy - since you were mostly involved with API/Params definition. I think your input on making date-time mandatory validated by RFC3999 validator would be needed.

The https://github.com/p1c2u/openapi-schema-validator made rfc3999 mandatory, so if we want to keep up-to-date version of the validator, this is the best way to follow.

Comment on lines +3 to +4
Copy link
Contributor

@Taragolis Taragolis Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMG! I've broke my head when I tried to understand why on Earth tests/api_connexion/endpoints/test_dag_run_endpoint.py::TestGetDagRunsPaginationFilters::test_date_filters_gte_and_lte failed, and which component/package changed 2020-06-18T18:00:00+00:00 to 2020-06-18T18:00:00 00:00

@Taragolis
Copy link
Contributor

it also adds validation to date-time fields validated by the Params of ours - for example naive date-time parameters are not supported any more (but they were in the past).

I guess if we fix URI tests by this PR, we could use backward compatible version by attempt to fix Params: #29374

I've also found weird thing that data-time from UI DagRun send as "2020-01-02 03:04:05+00:00" param instead on "2020-01-02T03:04:05+00:00"

@potiuk
Copy link
Member Author

potiuk commented Feb 7, 2023

I guess if we fix URI tests by this PR, we could use backward compatible version by attempt to fix Params: #29374

Yes. I just realized (sorry I was tired back then) that you already attempted to fix it. Yes. I think we could combine those two PRs - and once mine gets green (I just pushed a fix) - we can add the try_fix_not_rfc3339_dt. I feel little uneasy with Params earlier supporting the non-rfc3339 compliant date param (and not working with it after any more).

You can add a fixup to my PR :) @Taragolis then (and correct the newsfragment).

@potiuk
Copy link
Member Author

potiuk commented Feb 7, 2023

also > 4 on jsonschema should be added to this one (from your PR).

@Taragolis
Copy link
Contributor

Yes. I just realized (sorry I was tired back then) that you already attempted to fix it.

I guess I will spend couple of days before realise that the problem with URI encoding 🤦
I've already checked connexion, werkzeug, marshmallow 🤣

Yes. I think we could combine those two PRs - and once mine gets green (I just pushed a fix) - we can add the try_fix_not_rfc3339_dt.

We could combine or could make it as follow up. I do not have strong opinion for that and may be need more time for additional fixes in it.
Potentially we could also drop some code which do not required anymore as follow-up, like this:

def format_datetime(value: str) -> datetime:
"""
Format datetime objects.
Datetime format parser for args since connexion doesn't parse datetimes
https://github.com/zalando/connexion/issues/476
This should only be used within connection views because it raises 400
"""
value = value.strip()
if value[-1] != "Z":
value = value.replace(" ", "+")
try:
return timezone.parse(value)
except (ParserError, TypeError) as err:
raise BadRequest("Incorrect datetime argument", detail=str(err))

also > 4 on jsonschema should be added to this one (from your PR).

Yep! This behaviour of jsonschema always confuse me - base on dependencies which install in environment behaviour of validation for date-time (and other to honest) could be different.
I checked by pipdeptree and seems like we do not use it directly only pre-commit hooks and moto use it. Anyway sooner or later we would have to install rfc3339-validator as a dependency.

@potiuk
Copy link
Member Author

potiuk commented Feb 7, 2023

We could combine or could make it as follow up. I do not have strong opinion for that and may be need more time for additional fixes in it. Potentially we could also drop some code which do not required anymore as follow-up, like this:

I think better to combine it - it will make it more cherry-pickable. I think this one is a candidate to 2.5.2 because of this 'side-effect' behaviour for jsonschema/parameters

@potiuk potiuk added this to the Airflow 2.5.2 milestone Feb 7, 2023
@potiuk
Copy link
Member Author

potiuk commented Feb 7, 2023

The tests seems to be green BTW

@Taragolis
Copy link
Contributor

I think better to combine it - it will make it more cherry-pickable. I think this one is a candidate to 2.5.2 because of this 'side-effect' behaviour for jsonschema/parameters

I will check my branch with changes from this branch In the morning or afternoon.

One thing what I think we should decide, we have two options with Param date-time:

  1. As it implemented right now in Use rfc3339 compatible datetime in Param validation #29374, convert values iso8601 -> rfc3339 and naive -> awaire
  2. Just check is it valid iso8601, in case if rfc3339 validation failed, but do not change actual values and only warn users

@Taragolis
Copy link
Contributor

@potiuk I rebased on your changes and make small changes:

Just check is it valid iso8601, in case if rfc3339 validation failed, but do not change actual values and only warn users

We could use this commit 51594ba if everything alright

@potiuk
Copy link
Member Author

potiuk commented Feb 7, 2023

Yep I will merge + rebate shortly

Copy link
Contributor

@ephraimbuddy ephraimbuddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.
I wonder if it's 'possible' to handle the change in the endpoint itself using something like https://github.com/apache/airflow/blob/main/airflow/api_connexion/parameters.py#L38 and avoid breaking change? The issue seems to affect only dates in the query string

@potiuk potiuk requested review from XD-DENG, ashb and kaxil as code owners February 7, 2023 12:10
@potiuk potiuk force-pushed the fix-rfc-datetime branch 2 times, most recently from 505a33e to f37d6a6 Compare February 7, 2023 12:13
@Taragolis
Copy link
Contributor

Nice.
I wonder if it's 'possible' to handle the change in the endpoint itself using something like https://github.com/apache/airflow/blob/main/airflow/api_connexion/parameters.py#L38 and avoid breaking change? The issue seems to affect only dates in the query string

I think we use transformation by this function, unfortunately it happen after string validation.
Previously it didn't failed because jsonschema doesn't validate datetime if rfc3339-validator or strict-rfc3339 (jsonschema<4) not installed.

As another workaround we could overwrite default datetime checker in jsonschema however as result every user who validate json in Airflow Context would use our checker.

Potentially there is win-win condition but I can't find it 😞

@potiuk
Copy link
Member Author

potiuk commented Feb 7, 2023

Applied your change, rebased made you co-author and updated docs/commit message @Taragolis

@potiuk
Copy link
Member Author

potiuk commented Feb 7, 2023

Yeah. I am with @Taragolis here.

I think we use transformation by this function, unfortunately it happen after string validation.
Previously it didn't failed because jsonschema doesn't validate datetime if rfc3339-validator or strict-rfc3339 (jsonschema<4) not installed.

I think we need to bite the bullet now indeed. With Params it's a different story because we have full control over definition and validation and we even had an invalid example so far :( (so users might complain on it).

But our Python/Go Clients for example would work correctly (I believe? - this is something we could possibly check with @pierrejeambrun ?) and I think for the APIs the standard compliance. The OpanAPI standard is very clear https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.0.3.md#data-types:

string | date-time | As defined by date-time - RFC3339

There are no options here. "date-time" means RFC 3339.

So what this change brings is a compliance with standards. Yes. It might cause some troubles, but this should be an easy fix for any "hand-crafted" client (And a good reason to use our Python client as well).

@Taragolis
Copy link
Contributor

And in additional, if we select data from data picker than it send wit space separator instead of T
image

@potiuk
Copy link
Member Author

potiuk commented Feb 7, 2023

We might want to fix it as well

@Taragolis
Copy link
Contributor

I guess we need to change this value, let me check

$('.datetimepicker').datetimepicker({ format: 'YYYY-MM-DD HH:mm:ssZ' });

@Taragolis
Copy link
Contributor

Yep change YYYY-MM-DD HH:mm:ssZ to YYYY-MM-DDTHH:mm:ssZ make it pick as RFC3339 datetime

image

Just interesting if change in this place is it possible that in other places something break 🤔

@pierrejeambrun
Copy link
Member

pierrejeambrun commented Feb 7, 2023

@Taragolis for the front end, I know we also have a date picker in the grid that we might want to check as well. https://github.com/apache/airflow/blob/main/airflow/www/static/js/dag/nav/FilterBar.tsx#L65

For the clients, this is something that we need to check indeed. Quickly reading through the python client code tells me that there is some iso 8601 involved for deserialization. I would have to test for both clients to confirm that this would work, and if it would raise a warning or not. (iso 8601 or rfc3339 compliant format)

@Taragolis
Copy link
Contributor

Quickly reading through the python client code tells me that there is some iso 8601 involved for deserialization.

For deserialization it is fine, rfc3339 it is a valid iso8601 format.

@potiuk
Copy link
Member Author

potiuk commented Feb 7, 2023

Shall we go Yolo and Add the T :) ?

@Taragolis
Copy link
Contributor

I can't find any side effects with other date pickers, except strange behaviour with changing timezone UTC (default) -> Local -> UTC but I found that it not related to this changes.

@potiuk
Copy link
Member Author

potiuk commented Feb 7, 2023

I checked the behviour of the grid datepicker @pierrejeambrun and it works fine without warnings.

Even if the date is "isoFormatWithoutTZ", it produces a valid RFC3339 timestamp:

image

This is a valid rfc3339.txt:

from rfc3339_validator import validate_rfc3339
validate_rfc3339('2023-02-11T21:00:00Z')
True

Because it is used like this:

(localDate: string) => moment(localDate).utc().format(),

@potiuk
Copy link
Member Author

potiuk commented Feb 7, 2023

Also tested the client API:

import datetime

import airflow_client.client
from pprint import pprint
from airflow_client.client.api import dag_run_api

#
# In case of the basic authentication below. Make sure:
#  - Airflow is configured with the basic_auth as backend:
#     auth_backend = airflow.api.auth.backend.basic_auth
#  - Make sure that the client has been generated with securitySchema Basic.

# Configure HTTP basic authorization: Basic
configuration = airflow_client.client.Configuration(
    host="http://localhost:8080/api/v1",
    username='admin',
    password='admin'
)

# Enter a context with an instance of the API client
with airflow_client.client.ApiClient(configuration) as api_client:
    # Create an instance of the API class
    api_instance = dag_run_api.DAGRunApi(api_client)

    try:
        # Get current configuration
        api_response = api_instance.get_dag_runs(dag_id='example_params_ui_tutorial',
                                                 start_date_lte=datetime.datetime.now(
            tz=datetime.timezone.utc))
        pprint(api_response)
    except airflow_client.client.ApiException as e:
        print("Exception when calling ConfigApi->get_config: %s\n" % e)

Results in no warnings and properly formatted params:

image

When we remove timezone it correctly fails:

image

And we get this as response:

image

So it looks like everything works as expected. Even if the last one (no timezone failure) might be not the same as before, this is a proper behaviour - due to ambiguity of no-timezone query

The open-api-schema-validator 0.4.3 made RFC3339 validation of
the date-time fields mandatory and this revealed problems in our
test URLs - the '+' was not url-encoded and it was replaced with
space - thus the dates passed were not valid RFC3339 date-time
specifications.

This however revealed one more problem. The RFC3339-validator package
is automatically installed by the open-api-schema-validator, but when
installed, it also adds validation to date-time fields validated by
the Params of ours - for example naive date-time parameters. We
are limiting the supported parameters now to be iso8601-compliant
for backwards compatibility but we deprecate iso8601 in favor of
RFC3339.

This might introduce breaking changes for users who use non-valid
date-time parameters in the API, however we should consider that as a
bugfix, and accidental support, because the date-time schema should
expect RFC3999-formatted date time.

Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is>
@potiuk
Copy link
Member Author

potiuk commented Feb 7, 2023

I pushed the change with datepicker format change. I think it should be ready to go .

@potiuk
Copy link
Member Author

potiuk commented Feb 7, 2023

Static check fails (but irrelevant - I will fix it separately) and two flaky tests. Merging.

@potiuk potiuk merged commit a1faa2c into apache:main Feb 7, 2023
@potiuk potiuk deleted the fix-rfc-datetime branch February 7, 2023 21:37
@pierrejeambrun
Copy link
Member

Thanks for the checks Jarek, LGTM !

@pierrejeambrun pierrejeambrun added the type:bug-fix Changelog: Bug Fixes label Feb 27, 2023
pierrejeambrun pushed a commit that referenced this pull request Mar 8, 2023
The open-api-schema-validator 0.4.3 made RFC3339 validation of
the date-time fields mandatory and this revealed problems in our
test URLs - the '+' was not url-encoded and it was replaced with
space - thus the dates passed were not valid RFC3339 date-time
specifications.

This however revealed one more problem. The RFC3339-validator package
is automatically installed by the open-api-schema-validator, but when
installed, it also adds validation to date-time fields validated by
the Params of ours - for example naive date-time parameters. We
are limiting the supported parameters now to be iso8601-compliant
for backwards compatibility but we deprecate iso8601 in favor of
RFC3339.

This might introduce breaking changes for users who use non-valid
date-time parameters in the API, however we should consider that as a
bugfix, and accidental support, because the date-time schema should
expect RFC3999-formatted date time.

Co-authored-by: Andrey Anshin <Andrey.Anshin@taragol.is>
(cherry picked from commit a1faa2c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants