Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring ORM models and Alembic migrations in sync #16221

Merged
merged 1 commit into from
Dec 5, 2024

Conversation

mjpieters
Copy link
Contributor

@mjpieters mjpieters commented Dec 4, 2024

As I was looking into low-hanging fruit to improve prefect's type completeness, I ran into several issues with the ORM model definitions. Over time the migrations and the models have diverged somewhat, for various reasons, and this commit brings them back together again.

With this commit, you can run prefect server database revision --autogenerate in a clean checkout and alembic will generate empty migration steps. At the same time, the ORM models accurately reflect what is in the database.

Changes summary

  • Update the alembic include_object hook to support per-dialect filtering for indexes, and to avoid enum-to-varchar conversions for existing columns in SQLite.
  • Add indexes added by manual migrations into their respective ORM models, with dialect-specific information where needed.
  • Keep index names consistent across PostgreSQL and SQLite.
  • Drop an index that was accidentally applied to the wrong table in the SQLite migrations.
  • Drop an index that was created for a foreign key column now that no longer is a foreign key.
  • Correct ORM annotations for columns accidentally marked as nullable when they were converted to Mapped[...] / mapped_column()
  • Mark WorkPool.default_queue_id as a foreign key in the ORM model; it already is one in the database.
  • Drop columns that were removed from the ORM models.
  • Use Optional[...] instead of Union[..., None] in annotations.
  • Mark the Variable.value column as nullable; NULL represents None.
  • Annotate the JSONB columns to keep type completeness score lower

Changes per ORM model

PostgreSQL-only trigram indexes, defined solely in migration scripts

Each of these indexes is now attached to the ORM and alembic will only consider exporting these for the PostgreSQL migrations.

  • BlockDocument.name: trgm_ix_block_document_name
  • BlockSchema.capablities: ix_block_schema__capabilities
  • BlockType.name: trgm_ix_block_type_name
  • Deployment.name: trgm_ix_deployment_name
  • Flow.name: trgm_ix_flow_name
  • FlowRun.name: trgm_ix_flow_run_name
  • TaskRun.name: trgm_ix_task_run_name
  • WorkQueue.name: trgm_ix_work_queue_name

Artifact

  • Indexes created in migration, but against the wrong table in SQLite
    • 2023-03-24, ca9f934
      • ix_artifact__key_created_desc
      • for SQLite, it is incorrectly created against the artifact_collection table.

ArtifactCollection

  • Column nullable in PostgreSQL but not in SQLite, nor in the ORM model.
    • 2023-03-24, 027c123
      • latest_id, made NOT NULL in PostgreSQL.
  • Indexes created in migration, but against the wrong table in SQLite
    • 2023-03-24, ca9f934
      • ix_artifact__key_created_desc
      • for PG, it is correctly created against the artifact table.
  • Indexes declared in the ORM models but never migrated
    • 2023-03-24, 027c123
      • ix_artifact_collection__updated

BlockDocument

  • Indexes created in a migration but not added to the ORM models
    • 2023-10-30, d445671
      • Inconsistently named between PostgreSQL and SQLite:
        • PG: ix_block_document__block_type_name__name
        • SQLite: ix_block_document__block_type_name_name

Deployment

  • columns removed from ORM but never migrated
    • 2024-08-12, c53b00b
      • schedule
      • is_schedule_active
    • 2024-06-21, eaa7a50
      • manifest_path
    • 2022-07-25, 5784c63
      • flow_data
  • not marked as nullable in the type annotations
    • 2024-09-13, 926f7a4
      • work_queue_id

DeploymentSchedule

  • columns removed from ORM but never migrated
    • 2024-10-25, 0b62de6
      • max_active_runs
      • catchup

FlowRun

  • columns accidentally made nullable in the ORM, by switching to Mapped[...] and mapped_column() without realising that this changes the nullable setting in many cases.
    • 2024-09-13, 926f7a4
      • parent_task_run_id
  • no-longer-used FK index
    • 2023-03-24, d10c747
      • ix_flow_run__deployment_id
  • index was erroneously dropped from the SQLite schema
    • 2022-11-10, 519a2ed
      • ix_flow_run__scheduler_deployment_id_auto_scheduled_next_schedu
        • Index names are limited to 63 characters

FlowRunState

  • Column definition in ORM changed from JSONB to JSON

Run

  • columns accidentally made nullable in the ORM, by switching to Mapped[...] and mapped_column() without realising that this changes the nullable setting in many cases.
    • 2024-09-13, 926f7a4
      • expected_start_time
      • start_time
      • end_time

TaskRunState

  • Column definition in ORM changed from JSONB to JSON

Variable

  • columns set to not nullable in ORM but database has these as nullable
    • 2024-05-22, 7c37407
      • value
      • The JSONB column definition uses none_as_null=True, and tests expect None to be a valid variable value, so the ORM column was made nullable. Migrations had already made the column nullable in the database.
  • Migrated to JSONB column but ORM model uses plain JSON

Worker

  • Index created in migration but not in ORM model
    • 2023-01-09, 4a5c11e
      • ix_worker__work_pool_id_last_heartbeat_time
        • Renamed in this commit, originally created in c660a3f
  • Column marked as indexed but index was never created in a migration, almost certainly because the ix_worker__work_pool_id_last_heartbeat_time already covers this column.
    • 2022-12-10, c660a3f
      • ix_worker__last_heartbeat_time

WorkQueue

  • Index created in migration but not in ORM model
    • 2023-02-01, 839cc3e
      • ix_work_queue__work_pool_id_priority
  • Column marked as indexed but index was never created in a migration, almost certainly because the ix_work_queue__work_pool_id_priority already covers this column.

WorkPool

  • Index created in migration but not in ORM model
    • 2023-01-09, 4a5c11e
      • ix_work_pool__type
        • originally named ix_worker_pool__type but it was not present in the ORM model before this point either.
  • FK declared in the migration bun not in the ORM model
    • 2023-02-01, 839cc3e
      • FK from default_queue_id to work_queue.id

Copy link

codspeed-hq bot commented Dec 4, 2024

CodSpeed Performance Report

Merging #16221 will not alter performance

Comparing mjpieters:sync_orm_and_migrations (df3c519) with main (6308207)

Summary

✅ 3 untouched benchmarks

@mjpieters mjpieters force-pushed the sync_orm_and_migrations branch 2 times, most recently from 9edb9bd to 2fcdc40 Compare December 5, 2024 10:29
@mjpieters mjpieters marked this pull request as ready for review December 5, 2024 10:30
@mjpieters
Copy link
Contributor Author

I'll be adding a new PR that builds on this one to move orm_models.py to be fully type annotated, using the SQLAlchemy 2.0 Mapped[] / mapped_column() style.

@mjpieters mjpieters force-pushed the sync_orm_and_migrations branch 7 times, most recently from bfefcb2 to 7f7d98b Compare December 5, 2024 14:05
@desertaxle desertaxle added the development Tech debt, refactors, CI, tests, and other related work. label Dec 5, 2024
Copy link
Member

@desertaxle desertaxle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic! Thank you @mjpieters!

In addition to my comment about the dependency cycle warning raised by SQLAlchemy, I'm seeing a number of warnings when generating a migration against a SQLite DB:

/Users/alexander/dev/mjpieters/prefect/.venv/lib/python3.12/site-packages/alembic/ddl/impl.py:817: UserWarning: autogenerate skipping metadata-specified expression-based index 'ix_artifact__key_created_desc'; dialect 'sqlite' under SQLAlchemy 2.0.36 can't reflect these indexes so they can't be compared
  util.warn(
/Users/alexander/dev/mjpieters/prefect/.venv/lib/python3.12/site-packages/alembic/ddl/impl.py:817: UserWarning: autogenerate skipping metadata-specified expression-based index 'ix_flow_run__coalesce_start_time_expected_start_time_asc'; dialect 'sqlite' under SQLAlchemy 2.0.36 can't reflect these indexes so they can't be compared
  util.warn(
/Users/alexander/dev/mjpieters/prefect/.venv/lib/python3.12/site-packages/alembic/ddl/impl.py:817: UserWarning: autogenerate skipping metadata-specified expression-based index 'ix_flow_run__coalesce_start_time_expected_start_time_desc'; dialect 'sqlite' under SQLAlchemy 2.0.36 can't reflect these indexes so they can't be compared
  util.warn(
09:22:00.832 | WARNING | alembic.ddl.impl - Generating approximate signature for index Index('ix_flow_run__end_time_desc', <sqlalchemy.sql.elements.UnaryExpression object at 0x106f74650>). The dialect implementation should either skip expression indexes or provide a custom implementation.
09:22:00.832 | WARNING | alembic.ddl.impl - Generating approximate signature for index Index('ix_flow_run__expected_start_time_desc', <sqlalchemy.sql.elements.UnaryExpression object at 0x106f74530>). The dialect implementation should either skip expression indexes or provide a custom implementation.
09:22:00.833 | WARNING | alembic.ddl.impl - Generating approximate signature for index Index('ix_flow_run__next_scheduled_start_time_asc', <sqlalchemy.sql.elements.UnaryExpression object at 0x106f74ef0>). The dialect implementation should either skip expression indexes or provide a custom implementation.
09:22:00.836 | WARNING | alembic.ddl.impl - Generating approximate signature for index Index('uq_flow_run_state__flow_run_id_timestamp_desc', Column('flow_run_id', UUID(), ForeignKey('flow_run.id'), table=<flow_run_state>, nullable=False), <sqlalchemy.sql.elements.UnaryExpression object at 0x106ef1f10>, unique=True). The dialect implementation should either skip expression indexes or provide a custom implementation.
09:22:00.843 | WARNING | alembic.ddl.impl - Generating approximate signature for index Index('ix_task_run__end_time_desc', <sqlalchemy.sql.elements.UnaryExpression object at 0x1070200b0>). The dialect implementation should either skip expression indexes or provide a custom implementation.
09:22:00.844 | WARNING | alembic.ddl.impl - Generating approximate signature for index Index('ix_task_run__expected_start_time_desc', <sqlalchemy.sql.elements.UnaryExpression object at 0x106f77e60>). The dialect implementation should either skip expression indexes or provide a custom implementation.
09:22:00.844 | WARNING | alembic.ddl.impl - Generating approximate signature for index Index('ix_task_run__next_scheduled_start_time_asc', <sqlalchemy.sql.elements.UnaryExpression object at 0x106f77dd0>). The dialect implementation should either skip expression indexes or provide a custom implementation.
09:22:00.847 | WARNING | alembic.ddl.impl - Generating approximate signature for index Index('uq_task_run_state__task_run_id_timestamp_desc', Column('task_run_id', UUID(), ForeignKey('task_run.id'), table=<task_run_state>, nullable=False), <sqlalchemy.sql.elements.UnaryExpression object at 0x106ef3560>, unique=True). The dialect implementation should either skip expression indexes or provide a custom implementation.
09:22:00.847 | WARNING | alembic.ddl.impl - Generating approximate signature for index Index('ix_task_run_state_cache__cache_key_created_desc', Column('cache_key', String(), table=<task_run_state_cache>, nullable=False), <sqlalchemy.sql.elements.UnaryExpression object at 0x106f4a1b0>). The dialect implementation should either skip expression indexes or provide a custom implementation.

It looks like you added support for checking ddl_if in src/prefect/server/database/_migrations/env.py, so are these indexes that we should add .ddl_if(dialect="postgresql") to?

src/prefect/server/database/orm_models.py Outdated Show resolved Hide resolved
@mjpieters mjpieters force-pushed the sync_orm_and_migrations branch from 7f7d98b to 24c96bc Compare December 5, 2024 16:19
@mjpieters
Copy link
Contributor Author

It looks like you added support for checking ddl_if in src/prefect/server/database/_migrations/env.py, so are these indexes that we should add .ddl_if(dialect="postgresql") to?

I think we'd have to then also define a .ddl_if(dialect="sqlite") version. I'll take a look.

@mjpieters
Copy link
Contributor Author

Right, the issue is the combination of strings and sa.desc(). I'm replacing these with @declared_attr.directive methods so the Index() objects can be created referencing the column instances; e.g.:

    @declared_attr.directive
    def __table_args__(cls) -> Iterable[sa.Index]:
        return (
            sa.Index(
                "ix_artifact__key",
                cls.key,
            ),
            sa.Index(
                "ix_artifact__key_created_desc",
                cls.key,
                cls.created.desc(),
                postgresql_include=[
                    "id",
                    "updated",
                    "type",
                    "task_run_id",
                    "flow_run_id",
                ],
            ),
        )

@olearycrew
Copy link
Contributor

Thanks for these contributions @mjpieters !

@mjpieters mjpieters force-pushed the sync_orm_and_migrations branch from 24c96bc to dde21b3 Compare December 5, 2024 16:58
@mjpieters
Copy link
Contributor Author

There are two autogenerate skipping metadata-specified expression-based index warnings left, and we can't avoid those. That's because these warnings are generated before the include_object() hook is called and so we can't use ddl_if() to filter the indexes out.

The only alternative would be to not specify the indexes in the ORM models module, and then tell alembic to not drop these indexes when generating a migration. I think that would be worse off, as the indexes then just drop out of sight and people will forget to manage their migrations properly.

@mjpieters mjpieters requested a review from desertaxle December 5, 2024 17:22
@mjpieters mjpieters force-pushed the sync_orm_and_migrations branch from dde21b3 to 4031606 Compare December 5, 2024 17:28
- Update the alembic `include_object` hook to support per-dialect
  filtering for indexes, and to avoid enum-to-varchar conversions
  for existing columns in SQLite.
- Add indexes added by manual migrations into their respective ORM
  models, with dialect-specific information where needed.
- Keep index names consistent across PostgreSQL and SQLite.
- Drop an index that was accidentally applied to the wrong table in
  the SQLite migrations.
- Drop an index that was created for a foreign key column now that it
  no longer is a foreign key.
- Correct ORM annotations for columns accidentally marked as nullable
  when they were converted to `Mapped[...]` / `mapped_column()`
- Mark `WorkPool.default_queue_id` as a foreign key in the ORM model;
  it already is one in the database.
- Drop columns that were removed from the ORM models.
- Use `Optional[...]` instead of `Union[..., None]` in annotations.
- Mark the Variable.value column as nullable; NULL represents None.
@mjpieters mjpieters force-pushed the sync_orm_and_migrations branch from 4031606 to df3c519 Compare December 5, 2024 17:32
@mjpieters
Copy link
Contributor Author

mjpieters commented Dec 5, 2024

I have no idea why the tests/test_flows.py::TestFlowTimeouts::test_timeout_does_not_wait_for_completion_for_sync_flows test failed in the CI pipeline; I can't reproduce that locally.

Given that my other PR has the same changes plus type annotation changes only, the test must be flakey.

Copy link
Member

@desertaxle desertaxle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! Thank you!

@desertaxle desertaxle merged commit b3b185c into PrefectHQ:main Dec 5, 2024
37 checks passed
@mjpieters mjpieters deleted the sync_orm_and_migrations branch December 5, 2024 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Tech debt, refactors, CI, tests, and other related work.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants