Bring ORM models and Alembic migrations in sync #16221

mjpieters · 2024-12-04T20:27:39Z

As I was looking into low-hanging fruit to improve prefect's type completeness, I ran into several issues with the ORM model definitions. Over time the migrations and the models have diverged somewhat, for various reasons, and this commit brings them back together again.

With this commit, you can run prefect server database revision --autogenerate in a clean checkout and alembic will generate empty migration steps. At the same time, the ORM models accurately reflect what is in the database.

Changes summary

Update the alembic include_object hook to support per-dialect filtering for indexes, and to avoid enum-to-varchar conversions for existing columns in SQLite.
Add indexes added by manual migrations into their respective ORM models, with dialect-specific information where needed.
Keep index names consistent across PostgreSQL and SQLite.
Drop an index that was accidentally applied to the wrong table in the SQLite migrations.
Drop an index that was created for a foreign key column now that no longer is a foreign key.
Correct ORM annotations for columns accidentally marked as nullable when they were converted to Mapped[...] / mapped_column()
Mark WorkPool.default_queue_id as a foreign key in the ORM model; it already is one in the database.
Drop columns that were removed from the ORM models.
Use Optional[...] instead of Union[..., None] in annotations.
Mark the Variable.value column as nullable; NULL represents None.
Annotate the JSONB columns to keep type completeness score lower

Changes per ORM model

PostgreSQL-only trigram indexes, defined solely in migration scripts

Each of these indexes is now attached to the ORM and alembic will only consider exporting these for the PostgreSQL migrations.

BlockDocument.name: trgm_ix_block_document_name
BlockSchema.capablities: ix_block_schema__capabilities
BlockType.name: trgm_ix_block_type_name
Deployment.name: trgm_ix_deployment_name
Flow.name: trgm_ix_flow_name
FlowRun.name: trgm_ix_flow_run_name
TaskRun.name: trgm_ix_task_run_name
WorkQueue.name: trgm_ix_work_queue_name

`Artifact`

Indexes created in migration, but against the wrong table in SQLite
- 2023-03-24, ca9f934
  - ix_artifact__key_created_desc
  - for SQLite, it is incorrectly created against the artifact_collection table.

`ArtifactCollection`

Column nullable in PostgreSQL but not in SQLite, nor in the ORM model.
- 2023-03-24, 027c123
  - latest_id, made NOT NULL in PostgreSQL.
Indexes created in migration, but against the wrong table in SQLite
- 2023-03-24, ca9f934
  - ix_artifact__key_created_desc
  - for PG, it is correctly created against the artifact table.
Indexes declared in the ORM models but never migrated
- 2023-03-24, 027c123
  - ix_artifact_collection__updated

`BlockDocument`

Indexes created in a migration but not added to the ORM models
- 2023-10-30, d445671
  - Inconsistently named between PostgreSQL and SQLite:
    - PG: ix_block_document__block_type_name__name
    - SQLite: ix_block_document__block_type_name_name

`Deployment`

columns removed from ORM but never migrated
- 2024-08-12, c53b00b
  - schedule
  - is_schedule_active
- 2024-06-21, eaa7a50
  - manifest_path
- 2022-07-25, 5784c63
  - flow_data
not marked as nullable in the type annotations
- 2024-09-13, 926f7a4
  - work_queue_id

`DeploymentSchedule`

columns removed from ORM but never migrated
- 2024-10-25, 0b62de6
  - max_active_runs
  - catchup

`FlowRun`

columns accidentally made nullable in the ORM, by switching to Mapped[...] and mapped_column() without realising that this changes the nullable setting in many cases.
- 2024-09-13, 926f7a4
  - parent_task_run_id
no-longer-used FK index
- 2023-03-24, d10c747
  - ix_flow_run__deployment_id
index was erroneously dropped from the SQLite schema
- 2022-11-10, 519a2ed
  - ix_flow_run__scheduler_deployment_id_auto_scheduled_next_schedu
    - Index names are limited to 63 characters

`FlowRunState`

Column definition in ORM changed from JSONB to JSON
- 2022-09-21, d742d79
  - data

`Run`

columns accidentally made nullable in the ORM, by switching to Mapped[...] and mapped_column() without realising that this changes the nullable setting in many cases.
- 2024-09-13, 926f7a4
  - expected_start_time
  - start_time
  - end_time

`TaskRunState`

Column definition in ORM changed from JSONB to JSON
- 2022-09-21, d742d79
  - data

`Variable`

columns set to not nullable in ORM but database has these as nullable
- 2024-05-22, 7c37407
  - value
  - The JSONB column definition uses none_as_null=True, and tests expect None to be a valid variable value, so the ORM column was made nullable. Migrations had already made the column nullable in the database.
Migrated to JSONB column but ORM model uses plain JSON
- 2024-05-22, 7c37407
  - value

`Worker`

Index created in migration but not in ORM model
- 2023-01-09, 4a5c11e
  - ix_worker__work_pool_id_last_heartbeat_time
    - Renamed in this commit, originally created in c660a3f

Column marked as indexed but index was never created in a migration, almost certainly because the ix_worker__work_pool_id_last_heartbeat_time already covers this column.
- 2022-12-10, c660a3f
  - ix_worker__last_heartbeat_time

`WorkQueue`

Index created in migration but not in ORM model
- 2023-02-01, 839cc3e
  - ix_work_queue__work_pool_id_priority

Column marked as indexed but index was never created in a migration, almost certainly because the ix_work_queue__work_pool_id_priority already covers this column.
- 2023-02-01, 839cc3e
  - priority

`WorkPool`

Index created in migration but not in ORM model
- 2023-01-09, 4a5c11e
  - ix_work_pool__type
    - originally named ix_worker_pool__type but it was not present in the ORM model before this point either.
FK declared in the migration bun not in the ORM model
- 2023-02-01, 839cc3e
  - FK from default_queue_id to work_queue.id

codspeed-hq · 2024-12-04T20:34:53Z

CodSpeed Performance Report

Merging #16221 will not alter performance

_{Comparing mjpieters:sync_orm_and_migrations (df3c519) with main (6308207)}

Summary

✅ 3 untouched benchmarks

mjpieters · 2024-12-05T10:30:55Z

I'll be adding a new PR that builds on this one to move orm_models.py to be fully type annotated, using the SQLAlchemy 2.0 Mapped[] / mapped_column() style.

desertaxle

This is fantastic! Thank you @mjpieters!

In addition to my comment about the dependency cycle warning raised by SQLAlchemy, I'm seeing a number of warnings when generating a migration against a SQLite DB:

/Users/alexander/dev/mjpieters/prefect/.venv/lib/python3.12/site-packages/alembic/ddl/impl.py:817: UserWarning: autogenerate skipping metadata-specified expression-based index 'ix_artifact__key_created_desc'; dialect 'sqlite' under SQLAlchemy 2.0.36 can't reflect these indexes so they can't be compared
  util.warn(
/Users/alexander/dev/mjpieters/prefect/.venv/lib/python3.12/site-packages/alembic/ddl/impl.py:817: UserWarning: autogenerate skipping metadata-specified expression-based index 'ix_flow_run__coalesce_start_time_expected_start_time_asc'; dialect 'sqlite' under SQLAlchemy 2.0.36 can't reflect these indexes so they can't be compared
  util.warn(
/Users/alexander/dev/mjpieters/prefect/.venv/lib/python3.12/site-packages/alembic/ddl/impl.py:817: UserWarning: autogenerate skipping metadata-specified expression-based index 'ix_flow_run__coalesce_start_time_expected_start_time_desc'; dialect 'sqlite' under SQLAlchemy 2.0.36 can't reflect these indexes so they can't be compared
  util.warn(
09:22:00.832 | WARNING | alembic.ddl.impl - Generating approximate signature for index Index('ix_flow_run__end_time_desc', <sqlalchemy.sql.elements.UnaryExpression object at 0x106f74650>). The dialect implementation should either skip expression indexes or provide a custom implementation.
09:22:00.832 | WARNING | alembic.ddl.impl - Generating approximate signature for index Index('ix_flow_run__expected_start_time_desc', <sqlalchemy.sql.elements.UnaryExpression object at 0x106f74530>). The dialect implementation should either skip expression indexes or provide a custom implementation.
09:22:00.833 | WARNING | alembic.ddl.impl - Generating approximate signature for index Index('ix_flow_run__next_scheduled_start_time_asc', <sqlalchemy.sql.elements.UnaryExpression object at 0x106f74ef0>). The dialect implementation should either skip expression indexes or provide a custom implementation.
09:22:00.836 | WARNING | alembic.ddl.impl - Generating approximate signature for index Index('uq_flow_run_state__flow_run_id_timestamp_desc', Column('flow_run_id', UUID(), ForeignKey('flow_run.id'), table=<flow_run_state>, nullable=False), <sqlalchemy.sql.elements.UnaryExpression object at 0x106ef1f10>, unique=True). The dialect implementation should either skip expression indexes or provide a custom implementation.
09:22:00.843 | WARNING | alembic.ddl.impl - Generating approximate signature for index Index('ix_task_run__end_time_desc', <sqlalchemy.sql.elements.UnaryExpression object at 0x1070200b0>). The dialect implementation should either skip expression indexes or provide a custom implementation.
09:22:00.844 | WARNING | alembic.ddl.impl - Generating approximate signature for index Index('ix_task_run__expected_start_time_desc', <sqlalchemy.sql.elements.UnaryExpression object at 0x106f77e60>). The dialect implementation should either skip expression indexes or provide a custom implementation.
09:22:00.844 | WARNING | alembic.ddl.impl - Generating approximate signature for index Index('ix_task_run__next_scheduled_start_time_asc', <sqlalchemy.sql.elements.UnaryExpression object at 0x106f77dd0>). The dialect implementation should either skip expression indexes or provide a custom implementation.
09:22:00.847 | WARNING | alembic.ddl.impl - Generating approximate signature for index Index('uq_task_run_state__task_run_id_timestamp_desc', Column('task_run_id', UUID(), ForeignKey('task_run.id'), table=<task_run_state>, nullable=False), <sqlalchemy.sql.elements.UnaryExpression object at 0x106ef3560>, unique=True). The dialect implementation should either skip expression indexes or provide a custom implementation.
09:22:00.847 | WARNING | alembic.ddl.impl - Generating approximate signature for index Index('ix_task_run_state_cache__cache_key_created_desc', Column('cache_key', String(), table=<task_run_state_cache>, nullable=False), <sqlalchemy.sql.elements.UnaryExpression object at 0x106f4a1b0>). The dialect implementation should either skip expression indexes or provide a custom implementation.

It looks like you added support for checking ddl_if in src/prefect/server/database/_migrations/env.py, so are these indexes that we should add .ddl_if(dialect="postgresql") to?

src/prefect/server/database/orm_models.py

mjpieters · 2024-12-05T16:20:07Z

It looks like you added support for checking ddl_if in src/prefect/server/database/_migrations/env.py, so are these indexes that we should add .ddl_if(dialect="postgresql") to?

I think we'd have to then also define a .ddl_if(dialect="sqlite") version. I'll take a look.

mjpieters · 2024-12-05T16:28:12Z

Right, the issue is the combination of strings and sa.desc(). I'm replacing these with @declared_attr.directive methods so the Index() objects can be created referencing the column instances; e.g.:

    @declared_attr.directive
    def __table_args__(cls) -> Iterable[sa.Index]:
        return (
            sa.Index(
                "ix_artifact__key",
                cls.key,
            ),
            sa.Index(
                "ix_artifact__key_created_desc",
                cls.key,
                cls.created.desc(),
                postgresql_include=[
                    "id",
                    "updated",
                    "type",
                    "task_run_id",
                    "flow_run_id",
                ],
            ),
        )

olearycrew · 2024-12-05T16:54:12Z

Thanks for these contributions @mjpieters !

mjpieters · 2024-12-05T17:03:26Z

There are two autogenerate skipping metadata-specified expression-based index warnings left, and we can't avoid those. That's because these warnings are generated before the include_object() hook is called and so we can't use ddl_if() to filter the indexes out.

The only alternative would be to not specify the indexes in the ORM models module, and then tell alembic to not drop these indexes when generating a migration. I think that would be worse off, as the indexes then just drop out of sight and people will forget to manage their migrations properly.

- Update the alembic `include_object` hook to support per-dialect filtering for indexes, and to avoid enum-to-varchar conversions for existing columns in SQLite. - Add indexes added by manual migrations into their respective ORM models, with dialect-specific information where needed. - Keep index names consistent across PostgreSQL and SQLite. - Drop an index that was accidentally applied to the wrong table in the SQLite migrations. - Drop an index that was created for a foreign key column now that it no longer is a foreign key. - Correct ORM annotations for columns accidentally marked as nullable when they were converted to `Mapped[...]` / `mapped_column()` - Mark `WorkPool.default_queue_id` as a foreign key in the ORM model; it already is one in the database. - Drop columns that were removed from the ORM models. - Use `Optional[...]` instead of `Union[..., None]` in annotations. - Mark the Variable.value column as nullable; NULL represents None.

mjpieters · 2024-12-05T17:49:18Z

I have no idea why the tests/test_flows.py::TestFlowTimeouts::test_timeout_does_not_wait_for_completion_for_sync_flows test failed in the CI pipeline; I can't reproduce that locally.

Given that my other PR has the same changes plus type annotation changes only, the test must be flakey.

desertaxle

Amazing! Thank you!

mjpieters force-pushed the sync_orm_and_migrations branch 2 times, most recently from 9edb9bd to 2fcdc40 Compare December 5, 2024 10:29

mjpieters marked this pull request as ready for review December 5, 2024 10:30

mjpieters requested review from cicdw, desertaxle and zzstoatzz as code owners December 5, 2024 10:30

mjpieters force-pushed the sync_orm_and_migrations branch 7 times, most recently from bfefcb2 to 7f7d98b Compare December 5, 2024 14:05

mjpieters mentioned this pull request Dec 5, 2024

Use SQLAlchemy 2.0 style for ORM models, make typing complete #16230

Merged

desertaxle added the development Tech debt, refactors, CI, tests, and other related work. label Dec 5, 2024

desertaxle requested changes Dec 5, 2024

View reviewed changes

src/prefect/server/database/orm_models.py Outdated Show resolved Hide resolved

mjpieters force-pushed the sync_orm_and_migrations branch from 7f7d98b to 24c96bc Compare December 5, 2024 16:19

mjpieters force-pushed the sync_orm_and_migrations branch from 24c96bc to dde21b3 Compare December 5, 2024 16:58

mjpieters requested a review from desertaxle December 5, 2024 17:22

mjpieters force-pushed the sync_orm_and_migrations branch from dde21b3 to 4031606 Compare December 5, 2024 17:28

mjpieters force-pushed the sync_orm_and_migrations branch from 4031606 to df3c519 Compare December 5, 2024 17:32

desertaxle approved these changes Dec 5, 2024

View reviewed changes

desertaxle merged commit b3b185c into PrefectHQ:main Dec 5, 2024
37 checks passed

mjpieters deleted the sync_orm_and_migrations branch December 5, 2024 18:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bring ORM models and Alembic migrations in sync #16221

Bring ORM models and Alembic migrations in sync #16221

mjpieters commented Dec 4, 2024 •

edited

Loading

codspeed-hq bot commented Dec 4, 2024 •

edited

Loading

mjpieters commented Dec 5, 2024

desertaxle left a comment

mjpieters commented Dec 5, 2024

mjpieters commented Dec 5, 2024

olearycrew commented Dec 5, 2024

mjpieters commented Dec 5, 2024

mjpieters commented Dec 5, 2024 •

edited

Loading

desertaxle left a comment

Bring ORM models and Alembic migrations in sync #16221

Bring ORM models and Alembic migrations in sync #16221

Conversation

mjpieters commented Dec 4, 2024 • edited Loading

Changes summary

Changes per ORM model

PostgreSQL-only trigram indexes, defined solely in migration scripts

Artifact

ArtifactCollection

BlockDocument

Deployment

DeploymentSchedule

FlowRun

FlowRunState

Run

TaskRunState

Variable

Worker

WorkQueue

WorkPool

codspeed-hq bot commented Dec 4, 2024 • edited Loading

CodSpeed Performance Report

Merging #16221 will not alter performance

Summary

mjpieters commented Dec 5, 2024

desertaxle left a comment

Choose a reason for hiding this comment

mjpieters commented Dec 5, 2024

mjpieters commented Dec 5, 2024

olearycrew commented Dec 5, 2024

mjpieters commented Dec 5, 2024

mjpieters commented Dec 5, 2024 • edited Loading

desertaxle left a comment

Choose a reason for hiding this comment

mjpieters commented Dec 4, 2024 •

edited

Loading

`Artifact`

`ArtifactCollection`

`BlockDocument`

`Deployment`

`DeploymentSchedule`

`FlowRun`

`FlowRunState`

`Run`

`TaskRunState`

`Variable`

`Worker`

`WorkQueue`

`WorkPool`

codspeed-hq bot commented Dec 4, 2024 •

edited

Loading

mjpieters commented Dec 5, 2024 •

edited

Loading