Upgrade to SQLAlchemy 2.0 (#197)

chouinar · web-flow · commit bcaf0ebf2a01 · 2023-09-20T15:10:53.000-04:00
## Ticket #82 ## Changes Upgraded to SQLAlchemy 2.0, see context section for major changes in this version ## Context for reviewers SQLAlchemy 2.0 comes with an incredibly thorough migration guide: https://docs.sqlalchemy.org/en/20/changelog/migration_20.html Thankfully 1.4 started nudging usage of SQLAlchemy to use the new approaches, so the changes needed weren't that large. Noteworthy changes: * DB model definitions have been adjusted. The python types now need to be wrapped in `Mapped[...]`, and columns now get defined either with `mapped_column` OR nothing with SQLAlchemy figuring out the type from the python type (similar to Dataclasses or Pydantic). This largely results in more minimal class definitions especially ones with a bunch of basic columns. * DB session/connection/engine logic adjusted slightly to have fewer edge cases. For the changes here that just required some adjustments to the underlying connection setup, something I'd done on another project already to fix some connection issues. Largely this comes down to connections no longer auto-committing and needing to be in `with conn.begin()` blocks to function properly. * Raw SQL calls to `execute` should be wrapped in the `text` class * DeclarativeBase is better supported and instantiated slightly differently. The Metadata object no longer works globally and has to be attached to the DeclarativeBase to work properly. * Typing for MyPy is built into SQLAlchemy and no longer relies on libraries like SQLAlchemy-stubs ## Testing Since the DB hits virtually everything, I tested everything as thoroughly as possible. Because of how the SQLAlchemy-stubs we removed work, they need to be completely deleted otherwise MyPy thinks they should run. If you are running outside of the docker container, run `poetry install --no-root --all-extras --with dev --sync` which will delete extra packages. ### Basics Tests, formatting, linting, all working, only required a few fixes to make SQLAlchemy happy. ### SQLAlchemy warnings When running unit tests, SQLAlchemy will output warnings for deprecated features. The only still usable deprecated feature we had was our `get`queries which were adjusted. Prior to the fix, we would see this warning: ![Screenshot 2023-09-20 at 11 00 38 AM](https://github.com/navapbc/template-application-flask/assets/46358556/7bc1d0f8-15df-4d15-b991-6c57123e70c2) ### Migrations They still work, adding a few new columns generates what we would expect and uses the mapping config added to the Base class. ![Screenshot 2023-09-19 at 3 36 16 PM](https://github.com/navapbc/template-application-flask/assets/46358556/3d3bd20b-618a-473f-9e44-a8e52a6739f5) ![Screenshot 2023-09-19 at 3 36 11 PM](https://github.com/navapbc/template-application-flask/assets/46358556/273f1597-67b4-460d-bb2a-44f512781c06) ### Swagger Was able to successfully use the local swagger endpoints and create/update/read from the DB which was populated like so: <img width="1050" alt="Screenshot 2023-09-20 at 11 34 39 AM" src="https://github.com/navapbc/template-application-flask/assets/46358556/049c8531-8de4-4406-a3b7-9ed92d92da0e">
diff --git a/app/poetry.lock b/app/poetry.lock
diff --git a/app/pyproject.toml b/app/pyproject.toml
@@ -7,7 +7,7 @@ authors = ["Nava Engineering <engineering@navapbc.com>"]
 
 [tool.poetry.dependencies]
 python = "^3.10"
-SQLAlchemy = {extras = ["mypy"], version = "^1.4.40"}
+SQLAlchemy = {extras = ["mypy"], version = "2.0"}
 alembic = "^1.8.1"
 psycopg2-binary = "^2.9.3"
 python-dotenv = "^0.20.0"
@@ -37,6 +37,7 @@ bandit = "^1.7.4"
 pytest = "^6.0.0"
 pytest-watch = "^4.2.0"
 pytest-lazy-fixture = "^0.6.3"
+types-pyyaml = "^6.0.12.11"
 
 [build-system]
 requires = ["poetry-core>=1.0.0"]
@@ -80,8 +81,6 @@ warn_redundant_casts = true
 warn_unreachable = true
 warn_unused_ignores = true
 
-plugins = ["sqlalchemy.ext.mypy.plugin"]
-
 [tool.bandit]
 # Ignore audit logging test file since test audit logging requires a lot of operations that trigger bandit warnings
 exclude_dirs = ["./tests/src/logging/test_audit.py"]
diff --git a/app/src/adapters/db/clients/postgres_client.py b/app/src/adapters/db/clients/postgres_client.py
@@ -46,9 +46,6 @@ def get_conn() -> Any:
         return sqlalchemy.create_engine(
             "postgresql://",
             pool=conn_pool,
-            # FYI, execute many mode handles how SQLAlchemy handles doing a bunch of inserts/updates/deletes at once
-            # https://docs.sqlalchemy.org/en/14/dialects/postgresql.html#psycopg2-fast-execution-helpers
-            executemany_mode="batch",
             hide_parameters=db_config.hide_sql_parameter_logs,
             # TODO: Don't think we need this as we aren't using JSON columns, but keeping for reference
             # json_serializer=lambda o: json.dumps(o, default=pydantic.json.pydantic_encoder),
diff --git a/app/src/db/migrations/env.py b/app/src/db/migrations/env.py
@@ -30,7 +30,7 @@
 
     def include_object(
         object: sqlalchemy.schema.SchemaItem,
-        name: str,
+        name: str | None,
         type_: str,
         reflected: bool,
         compare_to: Any,
diff --git a/app/src/db/migrations/run.py b/app/src/db/migrations/run.py
@@ -1,16 +1,13 @@
 # Convenience script for running alembic migration commands through a pyscript
 # rather than the command line. This allows poetry to package and alias it for
 # running on the production docker image from any directory.
-import itertools
 import logging
 import os
-from typing import Optional
 
 import alembic.command as command
 import alembic.script as script
 import sqlalchemy
 from alembic.config import Config
-from alembic.operations.ops import MigrationScript
 from alembic.runtime import migration
 
 logger = logging.getLogger(__name__)
@@ -53,41 +50,3 @@ def have_all_migrations_run(db_engine: sqlalchemy.engine.Engine) -> None:
         logger.info(
             f"The current migration head is up to date, {current_heads} and Alembic is expecting {expected_heads}"
         )
-
-
-def check_model_parity() -> None:
-    revisions: list[MigrationScript] = []
-
-    def process_revision_directives(
-        context: migration.MigrationContext,
-        revision: Optional[str],
-        directives: list[MigrationScript],
-    ) -> None:
-        nonlocal revisions
-        revisions = list(directives)
-        # Prevent actually generating a migration
-        directives[:] = []
-
-    command.revision(
-        config=alembic_cfg,
-        autogenerate=True,
-        process_revision_directives=process_revision_directives,
-    )
-    diff = list(
-        itertools.chain.from_iterable(
-            op.as_diffs() for script in revisions for op in script.upgrade_ops_list
-        )
-    )
-
-    message = (
-        "The application models are not in sync with the migrations. You should generate "
-        "a new automigration or update your local migration file. "
-        "If there are unexpected errors you may need to merge main into your branch."
-    )
-
-    if diff:
-        for line in diff:
-            print("::error title=Missing migration::Missing migration:", line)
-
-        logger.error(message, extra={"issues": str(diff)})
-        raise Exception(message)
diff --git a/app/src/db/models/base.py b/app/src/db/models/base.py
@@ -4,10 +4,9 @@
 from typing import Any
 from uuid import UUID
 
-from sqlalchemy import TIMESTAMP, Column, MetaData, inspect
+from sqlalchemy import TIMESTAMP, MetaData, Text, inspect
 from sqlalchemy.dialects import postgresql
-from sqlalchemy.ext.declarative import as_declarative
-from sqlalchemy.orm import declarative_mixin
+from sqlalchemy.orm import DeclarativeBase, Mapped, declarative_mixin, mapped_column
 from sqlalchemy.sql.functions import now as sqlnow
 
 from src.util import datetime_util
@@ -26,10 +25,33 @@
 )
 
 
-@as_declarative(metadata=metadata)
-class Base:
+class Base(DeclarativeBase):
+    # Attach the metadata to the Base class so all tables automatically get added to the metadata
+    metadata = metadata
+
+    # Override the default type that SQLAlchemy will map python types to.
+    # This is used if you simply define a column like:
+    #
+    #   my_column: Mapped[str]
+    #
+    # If you provide a mapped_column attribute you can override these values
+    #
+    # See: https://docs.sqlalchemy.org/en/20/orm/declarative_tables.html#mapped-column-derives-the-datatype-and-nullability-from-the-mapped-annotation
+    #      for the default mappings
+    #
+    # See: https://docs.sqlalchemy.org/en/20/orm/declarative_tables.html#orm-declarative-mapped-column-type-map
+    #      for details on setting up this configuration.
+    type_annotation_map = {
+        # Always include a timezone for datetimes
+        datetime: TIMESTAMP(timezone=True),
+        # Explicitly use the Text column type for strings
+        str: Text,
+        # Always use the Postgres UUID column type
+        uuid.UUID: postgresql.UUID(as_uuid=True),
+    }
+
     def _dict(self) -> dict:
-        return {c.key: getattr(self, c.key) for c in inspect(self).mapper.column_attrs}
+        return {c.key: getattr(self, c.key) for c in inspect(self).mapper.column_attrs}  # type: ignore
 
     def for_json(self) -> dict:
         json_valid_dict = {}
@@ -46,9 +68,9 @@ def for_json(self) -> dict:
 
     def copy(self, **kwargs: dict[str, Any]) -> "Base":
         # TODO - Python 3.11 will let us make the return Self instead
-        table = self.__table__  # type: ignore
+        table = self.__table__
         non_pk_columns = [
-            k for k in table.columns.keys() if k not in table.primary_key.columns.keys()
+            k for k in table.columns.keys() if k not in table.primary_key.columns.keys()  # type: ignore
         ]
         data = {c: getattr(self, c) for c in non_pk_columns}
         data.update(kwargs)
@@ -59,10 +81,10 @@ def copy(self, **kwargs: dict[str, Any]) -> "Base":
 @declarative_mixin
 class IdMixin:
     """Mixin to add a UUID id primary key column to a model
-    https://docs.sqlalchemy.org/en/14/orm/declarative_mixins.html
+    https://docs.sqlalchemy.org/en/20/orm/declarative_mixins.html
     """
 
-    id: uuid.UUID = Column(postgresql.UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
+    id: Mapped[uuid.UUID] = mapped_column(primary_key=True, default=uuid.uuid4)
 
 
 def same_as_created_at(context: Any) -> Any:
@@ -72,18 +94,16 @@ def same_as_created_at(context: Any) -> Any:
 @declarative_mixin
 class TimestampMixin:
     """Mixin to add created_at and updated_at columns to a model
-    https://docs.sqlalchemy.org/en/14/orm/declarative_mixins.html#mixing-in-columns
+    https://docs.sqlalchemy.org/en/20/orm/declarative_mixins.html#mixing-in-columns
     """
 
-    created_at: datetime = Column(
-        TIMESTAMP(timezone=True),
+    created_at: Mapped[datetime] = mapped_column(
         nullable=False,
         default=datetime_util.utcnow,
         server_default=sqlnow(),
     )
 
-    updated_at: datetime = Column(
-        TIMESTAMP(timezone=True),
+    updated_at: Mapped[datetime] = mapped_column(
         nullable=False,
         default=same_as_created_at,
         onupdate=datetime_util.utcnow,
diff --git a/app/src/db/models/user_models.py b/app/src/db/models/user_models.py
@@ -4,9 +4,8 @@
 from typing import Optional
 from uuid import UUID
 
-from sqlalchemy import Boolean, Column, Date, Enum, ForeignKey, Text
-from sqlalchemy.dialects import postgresql
-from sqlalchemy.orm import Mapped, relationship
+from sqlalchemy import Enum, ForeignKey
+from sqlalchemy.orm import Mapped, mapped_column, relationship
 
 from src.db.models.base import Base, IdMixin, TimestampMixin
 
@@ -21,22 +20,22 @@ class RoleType(str, enum.Enum):
 class User(Base, IdMixin, TimestampMixin):
     __tablename__ = "user"
 
-    first_name: str = Column(Text, nullable=False)
-    middle_name: Optional[str] = Column(Text)
-    last_name: str = Column(Text, nullable=False)
-    phone_number: str = Column(Text, nullable=False)
-    date_of_birth: date = Column(Date, nullable=False)
-    is_active: bool = Column(Boolean, nullable=False)
+    first_name: Mapped[str]
+    middle_name: Mapped[Optional[str]]
+    last_name: Mapped[str]
+    phone_number: Mapped[str]
+    date_of_birth: Mapped[date]
+    is_active: Mapped[bool]
 
-    roles: list["Role"] = relationship(
+    roles: Mapped[list["Role"]] = relationship(
         "Role", back_populates="user", cascade="all, delete", order_by="Role.type"
     )
 
 
 class Role(Base, TimestampMixin):
     __tablename__ = "role"
-    user_id: Mapped[UUID] = Column(
-        postgresql.UUID(as_uuid=True), ForeignKey("user.id", ondelete="CASCADE"), primary_key=True
+    user_id: Mapped[UUID] = mapped_column(
+        ForeignKey("user.id", ondelete="CASCADE"), primary_key=True
     )
 
     # Set native_enum=False to use store enum values as VARCHAR/TEXT
@@ -48,6 +47,6 @@ class Role(Base, TimestampMixin):
     # not yet functional
     # (See https://github.com/sqlalchemy/alembic/issues/363)
     #
-    # https://docs.sqlalchemy.org/en/14/core/type_basics.html#sqlalchemy.types.Enum.params.native_enum
-    type: RoleType = Column(Enum(RoleType, native_enum=False), primary_key=True)
-    user: User = relationship(User, back_populates="roles")
+    # https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.Enum.params.native_enum
+    type: Mapped[RoleType] = mapped_column(Enum(RoleType, native_enum=False), primary_key=True)
+    user: Mapped[User] = relationship(User, back_populates="roles")
diff --git a/app/src/services/users/get_user.py b/app/src/services/users/get_user.py
@@ -11,7 +11,7 @@
 # https://github.com/navapbc/template-application-flask/issues/52
 def get_user(db_session: Session, user_id: str) -> User:
     # TODO: move this to service and/or persistence layer
-    result = db_session.query(User).options(orm.selectinload(User.roles)).get(user_id)
+    result = db_session.get(User, user_id, options=[orm.selectinload(User.roles)])
 
     if result is None:
         # TODO move HTTP related logic out of service layer to controller layer and just return None from here
diff --git a/app/src/services/users/patch_user.py b/app/src/services/users/patch_user.py
@@ -33,7 +33,7 @@ def patch_user(
 
     with db_session.begin():
         # TODO: move this to service and/or persistence layer
-        user = db_session.query(User).options(orm.selectinload(User.roles)).get(user_id)
+        user = db_session.get(User, user_id, options=[orm.selectinload(User.roles)])
 
         if user is None:
             # TODO move HTTP related logic out of service layer to controller layer and just return None from here
diff --git a/app/tests/conftest.py b/app/tests/conftest.py
@@ -80,7 +80,8 @@ def db_client(monkeypatch_session) -> db.DBClient:
     """
 
     with db_testing.create_isolated_db(monkeypatch_session) as db_client:
-        models.metadata.create_all(bind=db_client.get_connection())
+        with db_client.get_connection() as conn, conn.begin():
+            models.metadata.create_all(bind=conn)
         yield db_client
 
 
diff --git a/app/tests/lib/db_testing.py b/app/tests/lib/db_testing.py
@@ -3,6 +3,8 @@
 import logging
 import uuid
 
+from sqlalchemy import text
+
 import src.adapters.db as db
 from src.adapters.db.clients.postgres_config import get_db_config
 
@@ -25,8 +27,10 @@ def create_isolated_db(monkeypatch) -> db.DBClient:
     db_client = db.PostgresDBClient()
     with db_client.get_connection() as conn:
         _create_schema(conn, schema_name)
+
         try:
             yield db_client
+
         finally:
             _drop_schema(conn, schema_name)
 
@@ -35,11 +39,15 @@ def _create_schema(conn: db.Connection, schema_name: str):
     """Create a database schema."""
     db_test_user = get_db_config().username
 
-    conn.execute(f"CREATE SCHEMA IF NOT EXISTS {schema_name} AUTHORIZATION {db_test_user};")
+    with conn.begin():
+        conn.execute(
+            text(f"CREATE SCHEMA IF NOT EXISTS {schema_name} AUTHORIZATION {db_test_user};")
+        )
     logger.info("create schema %s", schema_name)
 
 
 def _drop_schema(conn: db.Connection, schema_name: str):
     """Drop a database schema."""
-    conn.execute(f"DROP SCHEMA {schema_name} CASCADE;")
+    with conn.begin():
+        conn.execute(text(f"DROP SCHEMA {schema_name} CASCADE;"))
     logger.info("drop schema %s", schema_name)
diff --git a/app/tests/src/db/models/factories.py b/app/tests/src/db/models/factories.py
@@ -45,7 +45,7 @@ def get_db_session() -> db.Session:
 
 # The scopefunc ensures that the session gets cleaned up after each test
 # it implicitly calls `remove()` on the session.
-# see https://docs.sqlalchemy.org/en/14/orm/contextual.html
+# see https://docs.sqlalchemy.org/en/20/orm/contextual.html
 Session = scoped_session(lambda: get_db_session(), scopefunc=lambda: get_db_session())