Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage controller: read from database in validate API #8784

Merged
merged 10 commits into from
Sep 4, 2024

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Aug 21, 2024

Problem

The initial implementation of the validate API treats the in-memory generations as authoritative.

  • This is true when only one storage controller is running, but if a rogue controller was running that hadn't been shut down properly, and some pageserver requests were routed to that bad controller, it could incorrectly return valid=true for stale generations.
  • The generation in the main in-memory map gets out of date while a live migration is in flight, and if the origin location for the migration tries to do some deletions even though it is in AttachedStale (for example because it had already started compaction), these might be wrongly validated + executed.

Summary of changes

  • Continue to do the in-memory check: if this returns valid=false it is sufficient to reject requests.
  • When valid=true, do an additional read from the database to confirm the generation is fresh.
  • Revise behavior for validation on missing shards: this used to always return valid=true as a convenience for deletions and shard splits, so that pageservers weren't prevented from completing any enqueued deletions for these shards after they're gone. However, this becomes unsafe when we consider split brain scenarios. We could reinstate this in future if we wanted to store some tombstones for deleted shards.
  • Update test_scrubber_physical_gc to cope with the behavioral change: they must now explicitly flush the deletion queue before splits, to avoid tripping up on deletions that are enqueued at the time of the split (these tests assert "scrubber deletes nothing", which check fails if the split leaves behind some remote objects that are legitimately GC'able)
  • Add test_storage_controller_validate_during_migration, which uses failpoints to create a situation where incorrect generation validation during a live migration could result in a corruption

The rate of validate calls for tenants is pretty low: it happens as a consequence deletions from GC and compaction, which are both concurrency-limited on the pageserver side.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@jcsp jcsp added t/bug Issue Type: Bug c/storage/controller Component: Storage Controller labels Aug 21, 2024
Copy link

github-actions bot commented Aug 21, 2024

3829 tests run: 3719 passed, 0 failed, 110 skipped (full report)


Flaky tests (4)

Postgres 16

Postgres 15

Postgres 14

Code coverage* (full report)

  • functions: 32.3% (7429 of 23006 functions)
  • lines: 50.5% (60175 of 119155 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
dc462a8 at 2024-09-04T12:04:17.575Z :recycle:

@jcsp jcsp requested a review from VladLazar September 2, 2024 10:14
@jcsp jcsp marked this pull request as ready for review September 2, 2024 10:14
@jcsp jcsp requested a review from a team as a code owner September 2, 2024 10:14
@jcsp jcsp requested a review from problame September 2, 2024 14:01
@jcsp jcsp self-assigned this Sep 2, 2024
Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reviewing this PR, I was wondering about the legitimacy of omitting tenants from the response, and found that storage_controller doesn't have an OpenAPI yaml. Maybe we should have that? The only docs for /validate behavior is in the RFC, and by reading the code that uses it in deletion_queue

storage_controller/src/service.rs Outdated Show resolved Hide resolved
storage_controller/src/persistence.rs Outdated Show resolved Hide resolved
test_runner/regress/test_storage_controller.py Outdated Show resolved Hide resolved
@jcsp jcsp requested a review from problame September 3, 2024 14:18
Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batching of 32 means a call with 10k tenants will be ~313 queries; conservatively 100ms per query yields 31 seconds of validate call duration. Would our client timeouts trip over this?

Looking at how we construct the client, I guess we get whathever the default reqwest timeout is:

let mut client = reqwest::ClientBuilder::new();
if let Some(jwt) = &conf.control_plane_api_token {
let mut headers = reqwest::header::HeaderMap::new();
headers.insert(
"Authorization",
format!("Bearer {}", jwt.get_contents()).parse().unwrap(),
);
client = client.default_headers(headers);
}
Some(Self {
http_client: client.build().expect("Failed to construct HTTP client"),
base_url: url,
node_id: conf.id,
cancel: cancel.clone(),
})
. And the default is "no timeout". So, I guess we're good on this.

storage_controller/src/persistence.rs Show resolved Hide resolved
storage_controller/src/service.rs Show resolved Hide resolved
@jcsp
Copy link
Collaborator Author

jcsp commented Sep 4, 2024

I'm going to merge this today to get the critical fix in, and then do a followup that chunks up validation requests from clients -- right now we pretty much assume they don't get into a situation where the deletion queue really needs to validate 10k tenants at once, but we should avoid letting that hang around as a "load bearing assumption".

@jcsp jcsp enabled auto-merge (squash) September 4, 2024 14:00
@jcsp jcsp merged commit 1a9b54f into main Sep 4, 2024
68 checks passed
@jcsp jcsp deleted the jcsp/storcon-validate-db branch September 4, 2024 14:00
jcsp added a commit that referenced this pull request Sep 26, 2024
…8997)

## Problem

- In #8784, the validate
controller API is modified to check generations directly in the
database. It batches tenants into separate queries to avoid generating a
huge statement, but
- While updating this, I realized that "control_plane_client" is a kind
of confusing name for the client code now that it primarily talks to the
storage controller (the case of talking to the control plane will go
away in a few months).

## Summary of changes

- Big rename to "ControllerUpcallClient" -- this reflects the storage
controller's api naming, where the paths used by the pageserver are in
`/upcall/`
- When sending validate requests, break them up into chunks so that we
avoid possible edge cases of generating any HTTP requests that require
database I/O across many thousands of tenants.

This PR mixes a functional change with a refactor, but the commits are
cleanly separated -- only the last commit is a functional change.

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
bayandin pushed a commit that referenced this pull request Sep 26, 2024
…8997)

## Problem

- In #8784, the validate
controller API is modified to check generations directly in the
database. It batches tenants into separate queries to avoid generating a
huge statement, but
- While updating this, I realized that "control_plane_client" is a kind
of confusing name for the client code now that it primarily talks to the
storage controller (the case of talking to the control plane will go
away in a few months).

## Summary of changes

- Big rename to "ControllerUpcallClient" -- this reflects the storage
controller's api naming, where the paths used by the pageserver are in
`/upcall/`
- When sending validate requests, break them up into chunks so that we
avoid possible edge cases of generating any HTTP requests that require
database I/O across many thousands of tenants.

This PR mixes a functional change with a refactor, but the commits are
cleanly separated -- only the last commit is a functional change.

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/controller Component: Storage Controller t/bug Issue Type: Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants