Skip to content

[tests] Disable synchronous writeback on test databases #8275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 10, 2025

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Jun 4, 2025

Avoids synchronously writing back data to the test database, as a performance optimization.

As mentioned in comments, this is not a safe operation for production databases, but seems like a reasonable choice for testing.

This matches the settings used by https://github.com/cockroachlabs-field/cockroachdb-single-node/

Comparison vs main:

  • build-and-test (ubuntu-22.04): 59-61 minutes -> This PR: 51 minutes

@smklein smklein marked this pull request as ready for review June 6, 2025 19:53
@smklein smklein requested review from sunshowers and jgallagher June 6, 2025 19:54
@smklein
Copy link
Collaborator Author

smklein commented Jun 6, 2025

My two cents: this is being inserted in test databases, where we are running in a single-node setup, with no expectation of real durability. I think the lack of fsync in this context "really should be fine" -- and since it's a call in test-only code of our wrapper, it isn't risking leaking into our production db.

However, it's certainly a scary-sounding cluster setting. I welcome feedback, and am interested in fielding concerns w.r.t. "is this okay", before we merge it.

(That said, it does cause a notable improvement in database-related tests)

///
/// This is not a recommended operation in production, but it can
/// drastically improve the performance of test databases.
pub async fn disable_synchronization(&self) -> Result<(), anyhow::Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion here but this logically seems fine to me. Would consider naming it something like debug_disable_synchronization or test_only_disable_synchronization to make it clear that production shouldn't use this code path.

Do we have a token type somewhere indicating we're in a non-production context? Passing that token type in here can also act as a decent proof.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could change the name -- but this entire function is already in the test-utils crate.

Looking for who depends on this crate:

 $ cargo tree --workspace --edges no-dev --depth 1 --invert omicron-test-utils
omicron-test-utils v0.1.0 (/home/smklein/repos/oxide/omicron/test-utils)
├── ch-dev v0.1.0 (/home/smklein/repos/oxide/omicron/dev-tools/ch-dev)
├── crdb-seed v0.1.0 (/home/smklein/repos/oxide/omicron/dev-tools/crdb-seed)
├── db-dev v0.1.0 (/home/smklein/repos/oxide/omicron/dev-tools/db-dev)
├── end-to-end-tests v0.1.0 (/home/smklein/repos/oxide/omicron/end-to-end-tests)
├── gateway-test-utils v0.1.0 (/home/smklein/repos/oxide/omicron/gateway-test-utils)
├── nexus-db-queries v0.1.0 (/home/smklein/repos/oxide/omicron/nexus/db-queries)
├── nexus-test-utils v0.1.0 (/home/smklein/repos/oxide/omicron/nexus/test-utils)
├── omicron-dev-lib v0.1.0 (/home/smklein/repos/oxide/omicron/dev-tools/omicron-dev-lib)
└── oximeter-test-utils v0.1.0 (/home/smklein/repos/oxide/omicron/oximeter/test-utils)

The only "non-test/dev" crate depending on omicron-test-utils is nexus-db-queries, and that crate only has the dependency with the testing feature flag:

  # only enabled during tests or via the `testing` feature
  omicron-test-utils = { workspace = true, optional = true }
  
  [features]
  # Enable to export `TestDatabase`
  testing = ["omicron-test-utils"]

I really only need it for the other usage in test-utils - I could make it pub(crate), but then we're still hoping omicron_test_utils::setup_database is only called by test code.

I think that's a reasonable assumption??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in that case I think it makes sense.

@davepacheco
Copy link
Collaborator

I don't think this is a bad thing to do since we're confident it won't leak to production. But just as another option: we could think of it as the responsibility of the runtime environment to configure this appropriately. Production is already configured to be durable. We could have Helios runners set the sync=disabled ZFS property. I'm not sure what the analogous thing would be for Linux. This would have the advantage of improving I/O latency for other synchronous writes, not just CockroachDB's.

@sunshowers
Copy link
Contributor

@davepacheco interesting idea -- could local test runs benefit from this approach too?

@jclulow
Copy link
Collaborator

jclulow commented Jun 6, 2025

we could think of it as the responsibility of the runtime environment to configure this appropriately. ... We could have Helios runners set the sync=disabled ZFS property.

FWIW, we do this automatically already for environments using the helios-2.0* targets:

 $ buildomat job run -n sync -t helios-2.0 -c 'zfs list -o name,sync'
job 01JX3HQZ6F2XAS6BYMP76PM7GJ submitted
polling for job output...
STATE CHANGE:  -> queued
STATE CHANGE: queued -> running
|=| job assigned to worker 01JX3HR0G6M457WRW69RM3GHZ1 [factory gimlet-EVT22200007-propolis, EVT22200007/29880] (queued for 33 s)
|T| starting task 0: "default"
NAME                     SYNC
rpool                disabled
rpool/ROOT           disabled
rpool/ROOT/5c73e4a3  disabled
rpool/dump           disabled
rpool/home           disabled
rpool/input          disabled
rpool/swap           disabled
|T| process exited: duration 29 ms, exit code 0
|W| found 0 output files
STATE CHANGE: running -> completed

@davepacheco
Copy link
Collaborator

@davepacheco interesting idea -- could local test runs benefit from this approach too?

Most definitely. In general, I set TMPDIR to a ZFS dataset with sync=disabled so that builds, etc. also benefit from this. Obviously, this is super dangerous if you don't know what you're doing but I generally assume stuff that's going to a temp location doesn't need to be crash-safe.

@davepacheco
Copy link
Collaborator

we could think of it as the responsibility of the runtime environment to configure this appropriately. ... We could have Helios runners set the sync=disabled ZFS property.

FWIW, we do this automatically already for environments using the helios-2.0* targets:

 $ buildomat job run -n sync -t helios-2.0 -c 'zfs list -o name,sync'
job 01JX3HQZ6F2XAS6BYMP76PM7GJ submitted
polling for job output...
STATE CHANGE:  -> queued
STATE CHANGE: queued -> running
|=| job assigned to worker 01JX3HR0G6M457WRW69RM3GHZ1 [factory gimlet-EVT22200007-propolis, EVT22200007/29880] (queued for 33 s)
|T| starting task 0: "default"
NAME                     SYNC
rpool                disabled
rpool/ROOT           disabled
rpool/ROOT/5c73e4a3  disabled
rpool/dump           disabled
rpool/home           disabled
rpool/input          disabled
rpool/swap           disabled
|T| process exited: duration 29 ms, exit code 0
|W| found 0 output files
STATE CHANGE: running -> completed

Good call. @smklein, do you see any improvement with this change in test time in those runners? That would be surprising, I assume.

@sunshowers
Copy link
Contributor

Most definitely. In general, I set TMPDIR to a ZFS dataset with sync=disabled so that builds, etc. also benefit from this. Obviously, this is super dangerous if you don't know what you're doing but I generally assume stuff that's going to a temp location doesn't need to be crash-safe.

Ah I meant something that's auto-configured -- you can definitely muck around with TMPDIR like that, but I think most people would just use an out-of-the-box config. (Maybe some day nextest can gain temp dir management, but that's not today :) )

FWIW on Linux, my /tmp is on a tmpfs. Not sure how it is on Mac.

@smklein
Copy link
Collaborator Author

smklein commented Jun 6, 2025

we could think of it as the responsibility of the runtime environment to configure this appropriately. ... We could have Helios runners set the sync=disabled ZFS property.

FWIW, we do this automatically already for environments using the helios-2.0* targets:

 $ buildomat job run -n sync -t helios-2.0 -c 'zfs list -o name,sync'
job 01JX3HQZ6F2XAS6BYMP76PM7GJ submitted
polling for job output...
STATE CHANGE:  -> queued
STATE CHANGE: queued -> running
|=| job assigned to worker 01JX3HR0G6M457WRW69RM3GHZ1 [factory gimlet-EVT22200007-propolis, EVT22200007/29880] (queued for 33 s)
|T| starting task 0: "default"
NAME                     SYNC
rpool                disabled
rpool/ROOT           disabled
rpool/ROOT/5c73e4a3  disabled
rpool/dump           disabled
rpool/home           disabled
rpool/input          disabled
rpool/swap           disabled
|T| process exited: duration 29 ms, exit code 0
|W| found 0 output files
STATE CHANGE: running -> completed

Good call. @smklein, do you see any improvement with this change in test time in those runners? That would be surprising, I assume.

It's really difficult to say, variance on the gimlet-EVT22200007-propolis Helios environment is extremely high - I'm seeing test times in the range of 2800 - 3600 seconds. Sample size of one, but the CI job for this PR looks like it took 3230 seconds on that runner.

@davepacheco
Copy link
Collaborator

(sorry this is getting tangential)

Most definitely. In general, I set TMPDIR to a ZFS dataset with sync=disabled so that builds, etc. also benefit from this. Obviously, this is super dangerous if you don't know what you're doing but I generally assume stuff that's going to a temp location doesn't need to be crash-safe.

Ah I meant something that's auto-configured -- you can definitely muck around with TMPDIR like that, but I think most people would just use an out-of-the-box config. (Maybe some day nextest can gain temp dir management, but that's not today :) )

I see what you're asking for. That seems hard to do without taking over a lot more responsibility for managing the user's environment (e.g., creating a dataset for them). I wouldn't mind an xtask or something that set that up and used it if it were present.

FWIW on Linux, my /tmp is on a tmpfs. Not sure how it is on Mac.

I believe we found on illumos that ZFS with sync=disabled was notably faster than tmpfs, at least under some conditions. I'm not sure about other systems.

@smklein
Copy link
Collaborator Author

smklein commented Jun 6, 2025

I see what you're asking for. That seems hard to do without taking over a lot more responsibility for managing the user's environment (e.g., creating a dataset for them). I wouldn't mind an xtask or something that set that up and used it if it were present.

I believe we found on illumos that ZFS with sync=disabled was notably faster than tmpfs, at least under some conditions. I'm not sure about other systems.

I think i'd be more okay with this on illumos - the usage of ZFS seems pretty universal - but do we want the build to be so opinionated for Linux / Mac on filesystem usage?

I think one of my struggles is "these cross-OS filesystem options might not be equivalent" - I know ext4 has a nobarrier option, and can e.g. disable journaling (https://www.kernel.org/doc/html/v5.0/admin-guide/ext4.html) but I think that if we went this path, and left CRDB as-is, it'll still be issuing a large number of fsyncs to the underlying storage. It seems fairly filesystem-dependent whether or not all those fsyncs will be ignored, or whether they'll cause a bottleneck in test performance.

Copy link
Contributor

@sunshowers sunshowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think on balance getting this to work out of the box is very high value -- system-level changes can do equivalent things here, but they're inherently more fragile than just instructing cockroach to call fsync less.

@smklein smklein merged commit 21baa14 into main Jun 10, 2025
16 checks passed
@smklein smklein deleted the disable-fsync-test-db branch June 10, 2025 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants