Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove JSON cast when querying archived workflows #13777

Merged
merged 1 commit into from
Oct 18, 2024

Conversation

MasonM
Copy link
Contributor

@MasonM MasonM commented Oct 17, 2024

Motivation

With PostgreSQL, the argo_archived_workflows.workflow column has been of type json ever since 8a1e611, which was released as v2.5.0. Therefore, the ::json casts do nothing, and prevent users from improving performance by migrating to JSONB using the following query:

alter table argo_archived_workflows alter column workflow set data type jsonb using workflow::jsonb

Without the changes in this PR, running the above will massively slow down the queries, because casting JSONB to JSON is expensive. With the changes, I'm measuring a ~80% performance boost (see below). Other users have reported similar improvements.

The only downside to migrating to JSONB is it can take a long time if you've got a ton of workflows (~72s on my local DB with 100,000 workflows). I entered #13779 with the migration, but I'm entering this separately so it can hopefully go out in 3.6.0. That way, we can tell users to run the above query if they run into performance issues, without having to do a patch release with the migration.

Modifications

Replace (workflow::json) with just workflow

Verification

See #13779

With PostgreSQL, the `argo_archived_workflows.workflow` column has been
of type `json` ever since 8a1e611,
which was released as v2.5.0. Therefore, the `::json` casts do nothing,
and prevent users from improving performance by migrating to JSONB using the following query:
```sql
alter table argo_archived_workflows alter column workflow set data type jsonb using workflow::jsonb
```

Without the changes in this PR, running the above will massively slow
down the queries, because casting JSONB to JSON is expensive. With the
changes in this PR, it improves performance by ~80%, which I determined
by running the benchmarks added in
argoproj#13767 against a DB with
100,000 randomly generated workflows generated by
argoproj#13715:
```
$ benchstat postgres_before_10000_workflows.txt postgres_after_10000_workflows.txt
goos: linux
goarch: amd64
pkg: github.com/argoproj/argo-workflows/v3/test/e2e
cpu: 12th Gen Intel(R) Core(TM) i5-12400
                                                     │ postgres_before_10000_workflows.txt │  postgres_after_10000_workflows.txt   │
                                                     │               sec/op                │    sec/op     vs base                 │
WorkflowArchive/ListWorkflows-12                                             185.81m ± ∞ ¹   25.06m ± ∞ ¹        ~ (p=1.000 n=1) ²
WorkflowArchive/ListWorkflows_with_label_selector-12                         186.35m ± ∞ ¹   25.99m ± ∞ ¹        ~ (p=1.000 n=1) ²
WorkflowArchive/CountWorkflows-12                                             42.39m ± ∞ ¹   11.78m ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                                                                       113.6m         19.72m        -82.64%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
```

The only downside to migrating to JSONB is it can take a long time
if you've got a ton of workflows (~72s on my local DB with 100,000
workflows). I'll enter a separate PR for the migration, but I'm entering
this change separately so it can hopefully go out in 3.6.0.

Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com>
@MasonM MasonM marked this pull request as ready for review October 17, 2024 18:27
Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiachengxu Would you mind reviewing this?

@MasonM
Copy link
Contributor Author

MasonM commented Oct 17, 2024

Here's the PR for the migration, which builds on top of this PR: #13779

@agilgur5 agilgur5 changed the title fix: remove JSON cast when querying archived workflows fix!: remove JSON cast when querying archived workflows Oct 17, 2024
@agilgur5 agilgur5 changed the title fix!: remove JSON cast when querying archived workflows fix: remove JSON cast when querying archived workflows Oct 17, 2024
Copy link
Member

@jiachengxu jiachengxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jessesuen jessesuen merged commit e09d2f1 into argoproj:main Oct 18, 2024
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants