Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit SQLAlchemy until MSSQL datetime bug is fixed #21272

Merged
merged 2 commits into from
Feb 3, 2022

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Feb 2, 2022

SQL Alchemy 1.4.10 introduces a bug where for PyODBC driver UTCDateTime
fields get wrongly converted as string and fail to be converted back to
datetime. It was supposed to be fixed in
sqlalchemy/sqlalchemy#6366 (released in
1.4.10) but apparently our case is different.

Opened sqlalchemy/sqlalchemy#7660 to track it


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

SQL Alchemy 1.4.10 introduces a bug where for PyODBC driver UTCDateTime
fields get wrongly converted as string and fail to be converted back to
datetime. It was supposed to be fixed in
sqlalchemy/sqlalchemy#6366 (released in
1.4.10) but apparently our case is different.

Opened sqlalchemy/sqlalchemy#7660 to track it
@potiuk potiuk force-pushed the limit-mssql-until-sqlalchemy-fixes-bug branch from de79b6f to b8b2750 Compare February 2, 2022 16:22
@potiuk potiuk closed this Feb 2, 2022
@potiuk potiuk reopened this Feb 2, 2022
setup.cfg Outdated Show resolved Hide resolved
Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
@potiuk
Copy link
Member Author

potiuk commented Feb 2, 2022

The story unfolds.

I tried to reproduce starting from a "minimum" script that sqlalchemy maintainers proposed. I modified it to embed our cusomizations (UTCDateTime field and utcnow function) but so far I could not reproduce it.

import pytz
import datetime as dt
import pendulum
import time

from sqlalchemy import Column
from sqlalchemy import create_engine
from sqlalchemy import DateTime
from sqlalchemy import Integer
from sqlalchemy import String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import Session
from sqlalchemy.types import TypeDecorator

Base = declarative_base()

# UTC time zone as a tzinfo instance.
utc = pendulum.tz.timezone('UTC')

class UtcDateTime(TypeDecorator):
    """
    Almost equivalent to :class:`~sqlalchemy.types.DateTime` with
    ``timezone=True`` option, but it differs from that by:

    - Never silently take naive :class:`~datetime.datetime`, instead it
      always raise :exc:`ValueError` unless time zone aware value.
    - :class:`~datetime.datetime` value's :attr:`~datetime.datetime.tzinfo`
      is always converted to UTC.
    - Unlike SQLAlchemy's built-in :class:`~sqlalchemy.types.DateTime`,
      it never return naive :class:`~datetime.datetime`, but time zone
      aware value, even with SQLite or MySQL.
    - Always returns DateTime in UTC

    """

    impl = DateTime(timezone=True)

    def process_bind_param(self, value, dialect):
        if value is not None:
            if not isinstance(value, dt.datetime):
                raise TypeError('expected datetime.datetime, not ' + repr(value))
            elif value.tzinfo is None:
                raise ValueError('naive datetime is disallowed')
            return value.astimezone(utc)
        return None

    def process_result_value(self, value, dialect):
        """
        Processes DateTimes from the DB making sure it is always
        returning UTC. Not using timezone.convert_to_utc as that
        converts to configured TIMEZONE while the DB might be
        running with some other setting. We assume UTC datetimes
        in the database.
        """
        if value is not None:
            if value.tzinfo is None:
                value = value.replace(tzinfo=utc)
            else:
                value = value.astimezone(utc)

        return value


class Project(Base):
    """
    Simple table describing projects.
    """

    __tablename__ = "project"

    id = Column("project_id", Integer, primary_key=True)
    name = Column(String, nullable=False)
    description = Column(String, nullable=False)
    created = Column(UtcDateTime)
    modified = Column(UtcDateTime)

def utcnow() -> dt.datetime:
    """
    Get the current date and time in UTC

    :return:
    """
    result = dt.datetime.utcnow()
    result = result.replace(tzinfo=utc)

    return result

e = create_engine(
    "mssql+pyodbc://sa:Airflow123@mssql:1433/airflow?driver=ODBC+Driver+17+for+SQL+Server",
    echo="debug",
)
Base.metadata.drop_all(e)
Base.metadata.create_all(e)
session = Session(e)
p1 = Project(
        name="name",
        description="desc",
        created=utcnow() - dt.timedelta(seconds=60),
        modified=utcnow()
    )
session.add(p1)
session.commit()

time.sleep(1)
p1.created= utcnow() - dt.timedelta(seconds=60)
session.flush()

Any comments are welcome :)

The script above can be executed in Breeze with --backend mssql

@ashb
Copy link
Member

ashb commented Feb 3, 2022

If we replace the UTCDateTime class definition with

UTCDateTime = DateTime

does the error still show up?

@potiuk
Copy link
Member Author

potiuk commented Feb 3, 2022

If we replace the UTCDateTime class definition with

UTCDateTime = DateTime

does the error still show up?

That's the first thing I checked. Yes. That's why I thought initially and that's why I thought this is not "our" problem. But I could not reproduce it with neither DateTime nor or UTCDateTime in the "MVE" (minimum viable error) script from SQLAlchemy.
My hypothesis is that maybe somewhere we hard-code or force different date format than the one that the driver uses, but I have not found it yet.

@potiuk
Copy link
Member Author

potiuk commented Feb 3, 2022

I gave myself a little break on searching since yesterday to get a new perspective and I will come back today to look for more clues :). But an ideas/hypotheses/recalls would be most welcome ..

BTW. It actually also crossed my mind "Should we drop MSSQL Idea :) ?". We keep on getting various kinds of troubles with it, t's still experimental, we have not released any version that would be MSSQL compatible, so the harm there is minimal. But I think it would be a little harsh. I spoke with a user on Slack yesterday who was mislead that MsSQL is supported and our website - even if mentioned that this is experimental, does not really say "it's not supported officially uet" (which was a bit surprising https://airflow.apache.org/docs/apache-airflow/stable/installation/prerequisites.html ) - only the https://github.com/apache/airflow/blob/main/README.md#requirements is precise with tht.

@ashb
Copy link
Member

ashb commented Feb 3, 2022

I'm not sure if it's relevant (probably not) but I've just noticed this warning:

/opt/airflow/airflow/models/serialized_dag.py:120 SAWarning: TypeDecorator UtcDateTime(timezone=True) will not produce a cache key because the cache_ok attribute is not set to True. This can have significant performance implications including some performance degradations in comparison to prior SQLAlchemy versions. Set this attribute to True if this type object's state is safe to use in a cache key, or False to disable this warning. (Background on this error at: https://sqlalche.me/e/14/cprf)

@ashb
Copy link
Member

ashb commented Feb 3, 2022

@potiuk can we merge this please?

@potiuk
Copy link
Member Author

potiuk commented Feb 3, 2022

@potiuk can we merge this please?

It does not seem like it solves all problems (if you look at the errors), but yeah, I think there might partially help at least for now.

@potiuk potiuk merged commit cf9f14a into apache:main Feb 3, 2022
@potiuk potiuk deleted the limit-mssql-until-sqlalchemy-fixes-bug branch February 3, 2022 12:51
potiuk added a commit that referenced this pull request Feb 3, 2022
@potiuk
Copy link
Member Author

potiuk commented Feb 3, 2022

I also updated constraints manualy to include SQLAlchemy == 1.4.9. Hopefully it will contain the "regular" PR errors now. But I think 1.4.9 also has other MSSQL issues that might show-up instead.

@aagateuip
Copy link
Contributor

I gave myself a little break on searching since yesterday to get a new perspective and I will come back today to look for more clues :). But an ideas/hypotheses/recalls would be most welcome ..

BTW. It actually also crossed my mind "Should we drop MSSQL Idea :) ?". We keep on getting various kinds of troubles with it, t's still experimental, we have not released any version that would be MSSQL compatible, so the harm there is minimal. But I think it would be a little harsh. I spoke with a user on Slack yesterday who was mislead that MsSQL is supported and our website - even if mentioned that this is experimental, does not really say "it's not supported officially uet" (which was a bit surprising https://airflow.apache.org/docs/apache-airflow/stable/installation/prerequisites.html ) - only the https://github.com/apache/airflow/blob/main/README.md#requirements is precise with tht.

@potiuk Ran into this issue and with this workaround it worked. Thank you! When do apache/airflow:2.2.3-python3.7 docker images get updated?
Also we are planning to use MSSQL server just in case you need a vote to keep MSSQL supported. Thank you!

@potiuk
Copy link
Member Author

potiuk commented Feb 17, 2022

2.2.3 will never be updated with that change.

MSSQL is not yet supported in 2.2 at all. 2.3.0 is the first time it MAY be supported.

@jedcunningham jedcunningham added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Mar 1, 2022
@Redoubts
Copy link

Redoubts commented Jul 3, 2022

Is this still a needed thing? sqlalchemy==1.4.9 seems like an old pin.

@potiuk
Copy link
Member Author

potiuk commented Jul 4, 2022

Good you reminded me - It would be worth checking

@potiuk
Copy link
Member Author

potiuk commented Jul 4, 2022

@Redoubts #24819 - seems that more recent SQLAlchemy solved the problem. Should be merged for 2.4 I think

@potiuk
Copy link
Member Author

potiuk commented Jul 4, 2022

Or maybe even 2.3.4 if we have one

@ashb
Copy link
Member

ashb commented Jul 5, 2022

We've got a regression in 2.3.3 so we can get it in there

@ashb ashb added this to the Airflow 2.3.3 milestone Jul 5, 2022
@potiuk
Copy link
Member Author

potiuk commented Jul 5, 2022

Yep. I will cherry-pick it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants