Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indexer: configure PG work mem for perf #17853

Merged
merged 1 commit into from
May 22, 2024

Conversation

gegaowp
Copy link
Contributor

@gegaowp gegaowp commented May 21, 2024

Description

This is to fix slow objects_snapshot update via query.
in the past, we used to have fewer object mutations and with recent increase on object mutations, the objects_snapshot table starts to fall behind on mainnet.

Test plan

perf comparison with and without a larger mem on a separate cloned DB

Insert on objects_snapshot  (cost=1787923.79..1810332.86 rows=0 width=0) (actual time=1007038.109..1007038.110 rows=0 loops=1)
   Conflict Resolution: UPDATE
   Conflict Arbiter Indexes: objects_snapshot_pkey
   Tuples Inserted: 0
   Conflicting Tuples: 341814
   ->  Subquery Scan on subquery  (cost=1787923.79..1810332.86 rows=3448 width=797) (actual time=15796.923..20093.768 rows=341814 loops=1)
         Filter: (subquery.rn = 1)
         ->  WindowAgg  (cost=1787923.79..1801713.99 rows=689510 width=805) (actual time=15796.921..19698.640 rows=341814 loops=1)
               Run Condition: (row_number() OVER (?) <= 1)
               ->  Sort  (cost=1787923.79..1789647.56 rows=689510 width=797) (actual time=15796.907..17168.209 rows=667901 loops=1)
                     Sort Key: objects_history.object_id, objects_history.object_version DESC
                     Sort Method: external merge  Disk: 352584kB
                     ->  Index Scan using objects_history_partition_403_checkpoint_sequence_number_ob_idx on objects_history_partition_403 objects_history  (cost=0.57..1235565.83 rows=689510 width=797) (actual time=6.255..14802.642 rows=667901 loops=1)
                           Index Cond: ((checkpoint_sequence_number >= 34475349) AND (checkpoint_sequence_number < 34475949))
 Planning Time: 0.261 ms
 Execution Time: 1007093.194 ms

with work_mem set to 16GB

 Insert on objects_snapshot  (cost=1093107.47..1111767.31 rows=0 width=0) (actual time=19278.181..19278.184 rows=0 loops=1)
   Conflict Resolution: UPDATE
   Conflict Arbiter Indexes: objects_snapshot_pkey
   Tuples Inserted: 0
   Conflicting Tuples: 281860
   ->  Subquery Scan on subquery  (cost=1093107.47..1111767.31 rows=2871 width=797) (actual time=599.297..1225.728 rows=281860 loops=1)
         Filter: (subquery.rn = 1)
         ->  WindowAgg  (cost=1093107.47..1104590.45 rows=574149 width=805) (actual time=599.295..1155.662 rows=281860 loops=1)
               Run Condition: (row_number() OVER (?) <= 1)
               ->  Sort  (cost=1093107.47..1094542.84 rows=574149 width=797) (actual time=599.257..783.334 rows=564334 loops=1)
                     Sort Key: objects_history.object_id, objects_history.object_version DESC
                     Sort Method: quicksort  Memory: 342007kB
                     ->  Index Scan using objects_history_partition_403_checkpoint_sequence_number_ob_idx on objects_history_partition_403 objects_history  (cost=0.57..1038187.06 rows=574149 width=797) (actual time=0.026..120.766 rows=564334 loops=1)
                           Index Cond: ((checkpoint_sequence_number >= 34476549) AND (checkpoint_sequence_number < 34477149))
 Planning Time: 3.929 ms
 Execution Time: 19297.406 ms

local run to make sure that the objects_snapshot can be updated with new codes


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:

Copy link

vercel bot commented May 21, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 21, 2024 7:02pm

Copy link

vercel bot commented May 21, 2024

@gegaowp is attempting to deploy a commit to the Mysten Labs Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

@emmazzz emmazzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@gegaowp gegaowp requested review from emmazzz and sadhansood May 21, 2024 20:19
@gegaowp gegaowp merged commit ce27447 into MystenLabs:main May 22, 2024
44 of 47 checks passed
gegaowp added a commit to gegaowp/sui that referenced this pull request May 22, 2024
## Description 

This is to fix slow `objects_snapshot` update via query.
in the past, we used to have fewer object mutations and with recent
increase on object mutations, the objects_snapshot table starts to fall
behind on mainnet.


## Test plan 
perf comparison with and without a larger mem on a separate cloned DB
```
Insert on objects_snapshot  (cost=1787923.79..1810332.86 rows=0 width=0) (actual time=1007038.109..1007038.110 rows=0 loops=1)
   Conflict Resolution: UPDATE
   Conflict Arbiter Indexes: objects_snapshot_pkey
   Tuples Inserted: 0
   Conflicting Tuples: 341814
   ->  Subquery Scan on subquery  (cost=1787923.79..1810332.86 rows=3448 width=797) (actual time=15796.923..20093.768 rows=341814 loops=1)
         Filter: (subquery.rn = 1)
         ->  WindowAgg  (cost=1787923.79..1801713.99 rows=689510 width=805) (actual time=15796.921..19698.640 rows=341814 loops=1)
               Run Condition: (row_number() OVER (?) <= 1)
               ->  Sort  (cost=1787923.79..1789647.56 rows=689510 width=797) (actual time=15796.907..17168.209 rows=667901 loops=1)
                     Sort Key: objects_history.object_id, objects_history.object_version DESC
                     Sort Method: external merge  Disk: 352584kB
                     ->  Index Scan using objects_history_partition_403_checkpoint_sequence_number_ob_idx on objects_history_partition_403 objects_history  (cost=0.57..1235565.83 rows=689510 width=797) (actual time=6.255..14802.642 rows=667901 loops=1)
                           Index Cond: ((checkpoint_sequence_number >= 34475349) AND (checkpoint_sequence_number < 34475949))
 Planning Time: 0.261 ms
 Execution Time: 1007093.194 ms
```
with work_mem set to 16GB
```
 Insert on objects_snapshot  (cost=1093107.47..1111767.31 rows=0 width=0) (actual time=19278.181..19278.184 rows=0 loops=1)
   Conflict Resolution: UPDATE
   Conflict Arbiter Indexes: objects_snapshot_pkey
   Tuples Inserted: 0
   Conflicting Tuples: 281860
   ->  Subquery Scan on subquery  (cost=1093107.47..1111767.31 rows=2871 width=797) (actual time=599.297..1225.728 rows=281860 loops=1)
         Filter: (subquery.rn = 1)
         ->  WindowAgg  (cost=1093107.47..1104590.45 rows=574149 width=805) (actual time=599.295..1155.662 rows=281860 loops=1)
               Run Condition: (row_number() OVER (?) <= 1)
               ->  Sort  (cost=1093107.47..1094542.84 rows=574149 width=797) (actual time=599.257..783.334 rows=564334 loops=1)
                     Sort Key: objects_history.object_id, objects_history.object_version DESC
                     Sort Method: quicksort  Memory: 342007kB
                     ->  Index Scan using objects_history_partition_403_checkpoint_sequence_number_ob_idx on objects_history_partition_403 objects_history  (cost=0.57..1038187.06 rows=574149 width=797) (actual time=0.026..120.766 rows=564334 loops=1)
                           Index Cond: ((checkpoint_sequence_number >= 34476549) AND (checkpoint_sequence_number < 34477149))
 Planning Time: 3.929 ms
 Execution Time: 19297.406 ms
```


local run to make sure that the objects_snapshot can be updated with new
codes

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
gegaowp added a commit to gegaowp/sui that referenced this pull request May 22, 2024
## Description 

This is to fix slow `objects_snapshot` update via query.
in the past, we used to have fewer object mutations and with recent
increase on object mutations, the objects_snapshot table starts to fall
behind on mainnet.


## Test plan 
perf comparison with and without a larger mem on a separate cloned DB
```
Insert on objects_snapshot  (cost=1787923.79..1810332.86 rows=0 width=0) (actual time=1007038.109..1007038.110 rows=0 loops=1)
   Conflict Resolution: UPDATE
   Conflict Arbiter Indexes: objects_snapshot_pkey
   Tuples Inserted: 0
   Conflicting Tuples: 341814
   ->  Subquery Scan on subquery  (cost=1787923.79..1810332.86 rows=3448 width=797) (actual time=15796.923..20093.768 rows=341814 loops=1)
         Filter: (subquery.rn = 1)
         ->  WindowAgg  (cost=1787923.79..1801713.99 rows=689510 width=805) (actual time=15796.921..19698.640 rows=341814 loops=1)
               Run Condition: (row_number() OVER (?) <= 1)
               ->  Sort  (cost=1787923.79..1789647.56 rows=689510 width=797) (actual time=15796.907..17168.209 rows=667901 loops=1)
                     Sort Key: objects_history.object_id, objects_history.object_version DESC
                     Sort Method: external merge  Disk: 352584kB
                     ->  Index Scan using objects_history_partition_403_checkpoint_sequence_number_ob_idx on objects_history_partition_403 objects_history  (cost=0.57..1235565.83 rows=689510 width=797) (actual time=6.255..14802.642 rows=667901 loops=1)
                           Index Cond: ((checkpoint_sequence_number >= 34475349) AND (checkpoint_sequence_number < 34475949))
 Planning Time: 0.261 ms
 Execution Time: 1007093.194 ms
```
with work_mem set to 16GB
```
 Insert on objects_snapshot  (cost=1093107.47..1111767.31 rows=0 width=0) (actual time=19278.181..19278.184 rows=0 loops=1)
   Conflict Resolution: UPDATE
   Conflict Arbiter Indexes: objects_snapshot_pkey
   Tuples Inserted: 0
   Conflicting Tuples: 281860
   ->  Subquery Scan on subquery  (cost=1093107.47..1111767.31 rows=2871 width=797) (actual time=599.297..1225.728 rows=281860 loops=1)
         Filter: (subquery.rn = 1)
         ->  WindowAgg  (cost=1093107.47..1104590.45 rows=574149 width=805) (actual time=599.295..1155.662 rows=281860 loops=1)
               Run Condition: (row_number() OVER (?) <= 1)
               ->  Sort  (cost=1093107.47..1094542.84 rows=574149 width=797) (actual time=599.257..783.334 rows=564334 loops=1)
                     Sort Key: objects_history.object_id, objects_history.object_version DESC
                     Sort Method: quicksort  Memory: 342007kB
                     ->  Index Scan using objects_history_partition_403_checkpoint_sequence_number_ob_idx on objects_history_partition_403 objects_history  (cost=0.57..1038187.06 rows=574149 width=797) (actual time=0.026..120.766 rows=564334 loops=1)
                           Index Cond: ((checkpoint_sequence_number >= 34476549) AND (checkpoint_sequence_number < 34477149))
 Planning Time: 3.929 ms
 Execution Time: 19297.406 ms
```


local run to make sure that the objects_snapshot can be updated with new
codes

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
gegaowp added a commit that referenced this pull request May 23, 2024
## Description 

see the original PR

## Test plan 

see test plan of the original pr

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
.unwrap();
let pg_work_mem_query_string = format!("SET work_mem = '{}GB'", work_mem_gb);
let pg_work_mem_query = pg_work_mem_query_string.as_str();
transactional_blocking_with_retry!(
Copy link
Contributor

@wlmyng wlmyng May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think this actually works since we're setting work_mem in a separate transaction? only asking because im looking at the most recent objects_snapshot update and it's been running for more than 20 minutes now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants