Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indexer-alt: pruner task #20217

Merged
merged 2 commits into from
Nov 13, 2024
Merged

indexer-alt: pruner task #20217

merged 2 commits into from
Nov 13, 2024

Conversation

amnn
Copy link
Member

@amnn amnn commented Nov 11, 2024

Description

Add the task that actually deletes data, based on the reader low watermark.

Also (in a separate commit) fixes an issue where the "loud watermark update" logic was too chatty when the indexer was running faster than the network rate (i.e. during backfill).

Test plan

Run the indexer and note the following:

  • Metrics related to deleted rows by the pruner (from localhost:9184/metrics)
  • The contents of the watermarks table.
sui$ cargo run -p sui-indexer-alt --release --                                   \
  --database-url "postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt" \
  indexer --remote-store-url https://checkpoints.mainnet.sui.io/                 \
  --last-checkpoint 10000                                                        \
  --consistent-range 100 --consistent-pruning-interval 10                        \
  --pipeline sum_obj_types --pipeline wal_obj_types

Also tested running the indexer for an extended period of time (1M checkpoints over roughly half an hour in local testing), and noted how the pruner behaves. When configured as it would be in production (roughly one hour of consistent range, and a 5 minute pruning interval and a 2 minute pruning delay):

  • Many rows accumulated during backfill -- by the end of the 1M checkpoints, the pruner had only pruned up to between checkpoint 500K and checkpoint 700K depending on the pipeline. This should not be an issue under normal operation where the indexer will run for long enough for pruning to stabilise at the tip of the network (and it would be recommended practice to start from formal snapshot and therefore only need to run pruning from that point forward).
  • Because the reader watermark task and the pruner task use the same interval, it can take up to two ticks of that interval for the pruner to act on a change to its upperbound -- again, it should be okay, as the pruner's interval should be at least an order of magnitude smaller than its retention period.

Stack


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:
  • REST API:

@amnn amnn self-assigned this Nov 11, 2024
@amnn amnn temporarily deployed to sui-typescript-aws-kms-test-env November 11, 2024 13:46 — with GitHub Actions Inactive
Copy link

vercel bot commented Nov 11, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 13, 2024 2:00pm
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Nov 13, 2024 2:00pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Nov 13, 2024 2:00pm
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Nov 13, 2024 2:00pm

## Description

The loud watermark update logic always bumped the previous update
watermark by the same interval. When the indexer was running well ahead
of the loud watermark update rate, this would cause many updates to be
issued successively.

This change makes it so that the next update is always
`LOUD_WATERMARK_UPDATE_INTERVAL` away from the last loud update.

## Test plan

Run the indexer. Previously updates -- particularly from summary tables
that gathered changes up and write them out in big batches -- would come
in bursts, and this behaviour is no longer apparent after the change:

```
sui$ cargo run -p sui-indexer --release --                                       \
  --database-url "postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt" \
  indexer --remote-store-url https://checkpoints.mainnet.sui.io                  \
  --last-checkpoint 1000000 --consistent-range 3600
```
## Description

Add the task that actually deletes data, based on the reader low
watermark.

## Test plan

Run the indexer and note the following:

- Metrics related to deleted rows by the pruner (from
  `localhost:9184/metrics`)
- The contents of the `watermarks` table.

```
sui$ cargo run -p sui-indexer-alt --release --                                   \
  --database-url "postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt" \
  indexer --remote-store-url https://checkpoints.mainnet.sui.io                  \
  --last-checkpoint 10000                                                        \
  --consistent-range 100 --consistent-pruning-interval 10                        \
  --pipeline sum_obj_types --pipeline wal_obj_types
```

Also tested running the indexer for an extended period of time (1M
checkpoints over roughly half an hour in local testing), and noted how
the pruner behaves. When configured as it would be in production
(roughly one hour of consistent range, and a 5 minute pruning interval
and a 2 minute pruning delay):

- Many rows accumulated during backfill -- by the end of the 1M
  checkpoints, the pruner had only pruned up to between checkpoint 500K
  and checkpoint 700K depending on the pipeline. This should not be an
  issue under normal operation where the indexer will run for long
  enough for pruning to stabilise at the tip of the network (and it
  would be recommended practice to start from formal snapshot and
  therefore only need to run pruning from that point forward).
- Because the reader watermark task and the pruner task use the same
  interval, it can take up to two ticks of that interval for the pruner
  to act on a change to its upperbound -- again, it should be okay, as
  the pruner's interval should be at least an order of magnitude smaller
  than its retention period.
@amnn amnn temporarily deployed to sui-typescript-aws-kms-test-env November 13, 2024 13:59 — with GitHub Actions Inactive
@amnn amnn merged commit 89eaa96 into amnn/idx-config Nov 13, 2024
14 checks passed
@amnn amnn deleted the amnn/idx-pruner branch November 13, 2024 13:59
amnn added a commit that referenced this pull request Nov 13, 2024
## Description

This field turns out not to be used by the new pruner implementation,
because it is entirely based on checkpoints.

## Test plan

CI

## Stack 

- #20149 
- #20150 
- #20166 
- #20216 
- #20217 

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
- [ ] REST API:
amnn added a commit that referenced this pull request Nov 13, 2024
## Description

This field turns out not to be used by the new pruner implementation,
because it is entirely based on checkpoints.

## Test plan

CI

## Stack 

- #20149 
- #20150 
- #20166 
- #20216 
- #20217 

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
- [ ] REST API:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants