test: wait_for_upload_queue_empty no longer works after #8550

A problem uncovered by starting to do graceful shutdowns (#8655) in tests and benches, the symptom looks like "infinite layer flushes" even after the test has ended.

Most likely this is fallout from making flush frozen layer loop do `RemoteTimelineClient::wait_completion` after each flush in #8550. This has silently broken `wait_for_upload_queue_empty` which is now much more likely to see the queue being empty (while the next frozen layer is being written out).

As such, we cannot use `wait_for_upload_queue_empty` anymore. It should be replaced with something ("proper checkpoint") which takes in an Lsn, and waits for:
- all in-memory layers to be flushed and uploaded together with `index_part.json` uploads
- additionally doing an remote_consistent_lsn increase over any lsn gap

It should then be used with `flush_ep_to_pageserver` to get an Lsn (or do we need the lsn range?).

Completion criteria:
- the `wait_for_upload_queue_empty` will no longer exist
- regress suite converges to `flush_ep_to_pageserver` and the new thing from above
    - we need to make it handle lsn gaps, so after we've received the `last_record_lsn` we can flush frozen layers, which will advance the lsn over the gap

Slack thread ref: <https://neondb.slack.com/archives/C060CNA47S9/p1723564732056149?thread_ts=1723559868.379279&cid=C060CNA47S9>

---

> This might actually be fallout from #8550. That made the wait_for_upload_queue_empty check fail.
> 
> We should really get rid of all `wait_for_upload_queue_empty` and instead have a checkpointing mode where we provide the lsn (for example, received from `flush_ep_to_pageserver`) and checkpoint waits until remote_consistent_lsn is at that (uploads have completed). 

 _Originally posted by @koivunej in [#8712](https://github.com/neondatabase/neon/issues/8712#issuecomment-2286515065)_

Adding the bug label so we will triage this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: wait_for_upload_queue_empty no longer works after #8550 #8715

koivunej
openedon Aug 13, 2024

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

test: wait_for_upload_queue_empty no longer works after #8550 #8715

Description

koivunejopenedon Aug 13, 2024

Metadata