-
Notifications
You must be signed in to change notification settings - Fork 50
[tests] Disable synchronous writeback on test databases #8275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
My two cents: this is being inserted in test databases, where we are running in a single-node setup, with no expectation of real durability. I think the lack of fsync in this context "really should be fine" -- and since it's a call in test-only code of our wrapper, it isn't risking leaking into our production db. However, it's certainly a scary-sounding cluster setting. I welcome feedback, and am interested in fielding concerns w.r.t. "is this okay", before we merge it. (That said, it does cause a notable improvement in database-related tests) |
/// | ||
/// This is not a recommended operation in production, but it can | ||
/// drastically improve the performance of test databases. | ||
pub async fn disable_synchronization(&self) -> Result<(), anyhow::Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion here but this logically seems fine to me. Would consider naming it something like debug_disable_synchronization
or test_only_disable_synchronization
to make it clear that production shouldn't use this code path.
Do we have a token type somewhere indicating we're in a non-production context? Passing that token type in here can also act as a decent proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could change the name -- but this entire function is already in the test-utils
crate.
Looking for who depends on this crate:
$ cargo tree --workspace --edges no-dev --depth 1 --invert omicron-test-utils
omicron-test-utils v0.1.0 (/home/smklein/repos/oxide/omicron/test-utils)
├── ch-dev v0.1.0 (/home/smklein/repos/oxide/omicron/dev-tools/ch-dev)
├── crdb-seed v0.1.0 (/home/smklein/repos/oxide/omicron/dev-tools/crdb-seed)
├── db-dev v0.1.0 (/home/smklein/repos/oxide/omicron/dev-tools/db-dev)
├── end-to-end-tests v0.1.0 (/home/smklein/repos/oxide/omicron/end-to-end-tests)
├── gateway-test-utils v0.1.0 (/home/smklein/repos/oxide/omicron/gateway-test-utils)
├── nexus-db-queries v0.1.0 (/home/smklein/repos/oxide/omicron/nexus/db-queries)
├── nexus-test-utils v0.1.0 (/home/smklein/repos/oxide/omicron/nexus/test-utils)
├── omicron-dev-lib v0.1.0 (/home/smklein/repos/oxide/omicron/dev-tools/omicron-dev-lib)
└── oximeter-test-utils v0.1.0 (/home/smklein/repos/oxide/omicron/oximeter/test-utils)
The only "non-test/dev" crate depending on omicron-test-utils
is nexus-db-queries
, and that crate only has the dependency with the testing
feature flag:
# only enabled during tests or via the `testing` feature
omicron-test-utils = { workspace = true, optional = true }
[features]
# Enable to export `TestDatabase`
testing = ["omicron-test-utils"]
I really only need it for the other usage in test-utils
- I could make it pub(crate)
, but then we're still hoping omicron_test_utils::setup_database
is only called by test code.
I think that's a reasonable assumption??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah in that case I think it makes sense.
I don't think this is a bad thing to do since we're confident it won't leak to production. But just as another option: we could think of it as the responsibility of the runtime environment to configure this appropriately. Production is already configured to be durable. We could have Helios runners set the |
@davepacheco interesting idea -- could local test runs benefit from this approach too? |
FWIW, we do this automatically already for environments using the
|
Most definitely. In general, I set TMPDIR to a ZFS dataset with |
Good call. @smklein, do you see any improvement with this change in test time in those runners? That would be surprising, I assume. |
Ah I meant something that's auto-configured -- you can definitely muck around with TMPDIR like that, but I think most people would just use an out-of-the-box config. (Maybe some day nextest can gain temp dir management, but that's not today :) ) FWIW on Linux, my |
It's really difficult to say, variance on the |
(sorry this is getting tangential)
I see what you're asking for. That seems hard to do without taking over a lot more responsibility for managing the user's environment (e.g., creating a dataset for them). I wouldn't mind an xtask or something that set that up and used it if it were present.
I believe we found on illumos that ZFS with sync=disabled was notably faster than tmpfs, at least under some conditions. I'm not sure about other systems. |
I think i'd be more okay with this on illumos - the usage of ZFS seems pretty universal - but do we want the build to be so opinionated for Linux / Mac on filesystem usage? I think one of my struggles is "these cross-OS filesystem options might not be equivalent" - I know ext4 has a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think on balance getting this to work out of the box is very high value -- system-level changes can do equivalent things here, but they're inherently more fragile than just instructing cockroach to call fsync less.
Avoids synchronously writing back data to the test database, as a performance optimization.
As mentioned in comments, this is not a safe operation for production databases, but seems like a reasonable choice for testing.
This matches the settings used by https://github.com/cockroachlabs-field/cockroachdb-single-node/
Comparison vs main: