Don't cache files with diagnostics #19869

ntBre · 2025-08-11T17:04:04Z

Summary

To take advantage of the new diagnostics, we need to update our caching model to include all of the information supported by ruff_db's diagnostic type. Instead of trying to serialize all of this information, Micha suggested simply not caching files with diagnostics, like we already do for files with syntax errors. This PR is an attempt at that approach.

This has the added benefit of trimming down our Rule derives since this was the last place the FromStr/strum_macros::EnumString implementation was used, as well as the (de)serialization macros and CacheKey.

Test Plan

Existing tests, with their input updated not to include a diagnostic, plus a new test showing that files with lint diagnostics are not cached.

Benchmarks

In addition to tests, we wanted to check that this doesn't degrade performance too much. I posted part of this new analysis in #18198 (comment), but I'll duplicate it here. In short, there's not much difference between main and this branch for projects with few diagnostics (home-assistant, airflow), as expected. The difference for projects with many diagnostics (cpython) is quite a bit bigger (~300 ms vs ~220 ms), but most projects that run ruff regularly are likely to have very few diagnostics, so this may not be a problem practically.

I guess GitHub isn't really rendering this as I intended, but the extra separator line is meant to separate the benchmarks on main (above the line) from this branch (below the line).

Command	Mean [ms]	Min [ms]	Max [ms]
`ruff check cpython --no-cache --isolated --exit-zero`	322.0	317.5	326.2
`ruff check cpython --isolated --exit-zero`	217.3	209.8	237.9
`ruff check home-assistant --no-cache --isolated --exit-zero`	279.5	277.0	283.6
`ruff check home-assistant --isolated --exit-zero`	37.2	35.7	40.6
`ruff check airflow --no-cache --isolated --exit-zero`	133.1	130.4	146.4
`ruff check airflow --isolated --exit-zero`	34.7	32.9	41.6
:--------------------------------------------------------------	----------:	---------:	---------:
`ruff check cpython --no-cache --isolated --exit-zero`	330.1	324.5	333.6
`ruff check cpython --isolated --exit-zero`	309.2	306.1	314.7
`ruff check home-assistant --no-cache --isolated --exit-zero`	288.6	279.4	302.3
`ruff check home-assistant --isolated --exit-zero`	39.8	36.9	42.4
`ruff check airflow --no-cache --isolated --exit-zero`	134.5	131.3	140.6
`ruff check airflow --isolated --exit-zero`	39.1	37.2	44.3

I had Claude adapt one of the scripts from the hyperfine repo to make this plot, so it's not quite perfect, but maybe it's still useful. The table is probably more reliable for close comparisons. I'll put more details about the benchmarks below for the sake of future reproducibility.

Benchmark details

The versions of each project:

CPython: 6322edd260e8cad4b09636e05ddfb794a96a0451, the 3.10 branch from the contributing docs
home-assistant: 5585376b406f099fb29a970b160877b57e5efcb0
airflow: 29a1cb0cfde9d99b1774571688ed86cb60123896

The last two are just the main branches at the time I cloned the repos.

I don't think our Ruff config should be applied since I used --isolated, but these are cloned into my copy of Ruff at crates/ruff_linter/resources/test, and I trimmed the ./target/release/ prefix from each of the commands, but these are builds of Ruff in release mode.

And here's the script with the hyperfine invocation:

#!/bin/bash

cargo build --release --bin ruff

# git clone --depth 1 https://github.com/home-assistant/core crates/ruff_linter/resources/test/home-assistant
# git clone --depth 1 https://github.com/apache/airflow crates/ruff_linter/resources/test/airflow

bin=./target/release/ruff
resources=./crates/ruff_linter/resources/test
cpython=$resources/cpython
home_assistant=$resources/home-assistant
airflow=$resources/airflow

base=${1:-bench}

hyperfine --warmup 10 --export-json $base.json --export-markdown $base.md \
		  "$bin check $cpython --no-cache --isolated --exit-zero" \
		  "$bin check $cpython --isolated --exit-zero" \
		  "$bin check $home_assistant --no-cache --isolated --exit-zero" \
		  "$bin check $home_assistant --isolated --exit-zero" \
		  "$bin check $airflow --no-cache --isolated --exit-zero" \
		  "$bin check $airflow --isolated --exit-zero"

I ran this once on main (baseline in the graph, top half of the table) and once on this branch (nocache and bottom of the table).

Summary -- To take advantage of the new diagnostics, we need to update our caching model to include all of the information supported by `ruff_db`'s diagnostic type. Instead of trying to serialize all of this information, Micha suggested simply not caching files with diagnostics, like we already do for files with syntax errors. This PR is a prototype of that approach. Test Plan -- Existing tests, with their input updated not to include a diagnostic, plus a new test showing that files with lint diagnostics are not cached. Benchmarks -- In addition to tests, we wanted to check that this doesn't degrade performance too much. I posted part of this new analysis in #18198 (comment), but I'll duplicate it here. In short, there's not much difference between `main` and this branch for projects with few diagnostics (`home-assistant`, `airflow`), as expected. And the difference for projects with many diagnostics (`cpython`) isn't that big either. As discussed in #18198, printing the large number of diagnostics starts to dominate the CPython timing anyway. | Command | Mean [ms] | Min [ms] | Max [ms] | |:--------------------------------------------------------------|----------:|---------:|---------:| | `ruff check cpython --no-cache --isolated --exit-zero` | 322.0 | 317.5 | 326.2 | | `ruff check cpython --isolated --exit-zero` | 217.3 | 209.8 | 237.9 | | `ruff check home-assistant --no-cache --isolated --exit-zero` | 279.5 | 277.0 | 283.6 | | `ruff check home-assistant --isolated --exit-zero` | 37.2 | 35.7 | 40.6 | | `ruff check airflow --no-cache --isolated --exit-zero` | 133.1 | 130.4 | 146.4 | | `ruff check airflow --isolated --exit-zero` | 34.7 | 32.9 | 41.6 | |:--------------------------------------------------------------|----------:|---------:|---------:| | `ruff check cpython --no-cache --isolated --exit-zero` | 330.1 | 324.5 | 333.6 | | `ruff check cpython --isolated --exit-zero` | 309.2 | 306.1 | 314.7 | | `ruff check home-assistant --no-cache --isolated --exit-zero` | 288.6 | 279.4 | 302.3 | | `ruff check home-assistant --isolated --exit-zero` | 39.8 | 36.9 | 42.4 | | `ruff check airflow --no-cache --isolated --exit-zero` | 134.5 | 131.3 | 140.6 | | `ruff check airflow --isolated --exit-zero` | 39.1 | 37.2 | 44.3 | I had Claude adapt one of the scripts from the hyperfine repo to make this plot, so it's not quite perfect, but maybe it's still useful. The table is probably more reliable for close comparisons. I'll put more details about the benchmarks below for the sake of future reproducibility. <plot> <details>

github-actions · 2025-08-11T17:14:03Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

MichaReiser · 2025-08-11T18:08:03Z

I don't mind the regression for CPython too much. It has a vast amount of diagnostics which should be uncommon. What I'm more surprised by is the regression for homeassitant and airflow. Do they have any diagnostics, is it just noise, or do you know where the regression is comming from?

MichaReiser · 2025-08-11T18:08:46Z

crates/ruff/src/cache.rs

-    pub(super) source: String,
    /// Notebook index if this file is a Jupyter Notebook.
    #[bincode(with_serde)]
    pub(super) notebook_index: Option<NotebookIndex>,


What's preventing us from removing the Notebook index too? (in which case FileCacheData only needs to store whether a file had any violations)

Ah of course, thanks. It kind of looked used, but obviously we won't actually look up a notebook in this map if the file has no diagnostics. I replaced LintCacheData with a linted field to mirror the formatted field.

I also updated lint_path (and lint_stdin) to avoid constructing a NotebookIndex that wouldn't be cached (e2bc85e). Otherwise the same_results cache tests were failing with differences in the NotebookIndex field. At first I just fixed this in the test, but it seemed like a small optimization in the real code anyway.

ntBre · 2025-08-11T18:14:42Z

I don't mind the regression for CPython too much. It has a vast amount of diagnostics which should be uncommon. What I'm more surprised by is the regression for homeassitant and airflow. Do they have any diagnostics, is it just noise, or do you know where the regression is comming from?

I think it's mostly noise (you can see CPython without a cache was 8 ms slower in the second run), but they both have some diagnostics too. home-assistant has 21 diagnostics, and airflow has 143, compared to 15,577 in cpython.

I'll take another look at the notebook index too.

but drop part of the comment since these diagnostics aren't cached anyway

MichaReiser

I think there's some more simplification that can be done here.

MichaReiser · 2025-08-12T07:28:04Z

crates/ruff/src/cache.rs

+    /// Convert the file cache into an empty `Diagnostics`, if it has been linted.
+    pub(crate) fn to_diagnostics(&self) -> Option<Diagnostics> {


This feels a bit overkill, given that it always returns an empty Diagnostics or None. I think I'd change this to a bool which should also allow simplifying the call site, e.g. the following can be removed entirely:

// `FixMode::Generate` and `FixMode::Diff` rely on side-effects (writing to disk, // and writing the diff to stdout, respectively). If a file has diagnostics, we // need to avoid reading from and writing to the cache in these modes.

crates/ruff/src/cache.rs

MichaReiser · 2025-08-12T07:33:53Z

crates/ruff/src/diagnostics.rs

    if let Some((cache, relative_path, key)) = caching {
-        // We don't cache parsing errors.
-        if !has_error {
+        // We don't cache errors.


Suggested change

// We don't cache errors.

// We don't cache files with diagnostics.

I think this allows us to now remove result.has_syntax_errors all together. It's only used in the benchmarking code but we can just get that information from the parsed module instead.

I think the current assertion in the benchmark is a bit stronger than what we get from the parsed module because it also includes semantic syntax errors, but I'm not sure that was intentional. The comment on the assertion just says "parse errors," so it's probably fine to assert only on those.

Do we need some kind of assertion on the benchmark result to make sure it doesn't get optimized out? If we assert on the parsed module, result becomes unused.

We can at least trim down the LinterResult methods to has_no_syntax_errors and has_invalid_syntax. Those are both only used in one place:

ruff/crates/ruff_benchmark/benches/linter.rs

Lines 91 to 92 in ad28b80

// Assert that file contains no parse errors

assert!(!result.has_syntax_errors());

(after removing the ! and using has_no_syntax_errors)

ruff/crates/ruff_server/src/fix.rs

Lines 78 to 81 in ad28b80

if result.has_invalid_syntax() {

// If there's a syntax error, then there won't be any fixes to apply.

return Ok(Fixes::default());

}

The comment on the assertion just says "parse errors," so it's probably fine to assert only on those.

Yeah, I think that's fine. The reason I added the assertion originally because we used to download the test files from Github and I once copied the wrong link (not the raw.github but the regular github.com link). The result was that we downloaded and benchmarked the HTML files instead of the Python files... whoops :)

That's why this assertion is there

I think you could return LintResult and criterion will automatically wrap it in a hint::black_box? (and also drop it)

MichaReiser · 2025-08-12T07:35:22Z

crates/ruff/src/diagnostics.rs

-                        transformed.as_ipy_notebook().map(Notebook::index).cloned(),
-                    ),
-                );
+                cache.set_linted(relative_path.to_owned(), &key);


I think this is fine, given that this is pre-existing code. But which code path sets linted to false if a file has diagnostics (how do we clear the state?)

Ah, I think you're right. I guess this is different from formatted since that always gets set to true whether the file was reformatted or already formatted. I reworked this section a bit to set linted to diagnostics.is_empty() && use_fixes where use_fixes is the result of the match above this.

MichaReiser · 2025-08-12T07:37:47Z

crates/ruff/src/diagnostics.rs

+    // Avoid constructing a map when there are no diagnostics. The index is only used for rendering
+    // diagnostics anyway. This mirrors our caching behavior, which does not preserve an empty index
+    // either.


This feels a bit subtle and it might not be evident for callers that notebook_indexes is empty if there are no diagnostics. We should document this constraint in Diagnostics or consider moving it into a constructor within Diagnostics.

We could also just do this filtering step in the test (or adjust the assertion not to check the notebook index), if you prefer. This didn't show any difference in the benchmarks, so it's not really helpful as an optimization, especially if it's confusing.

I did this in 13d7cf9 but happy to revert and document/update this code instead if you prefer it.

I'm realizing now that we probably also need the docs since it still affects Diagnostics created from the cache...

in the previous commit, this check was replaced by `diagnostics.is_empty()` because any syntax errors will already have been converted to diagnostics. here, we revert the assertion in the benchmark to use only parse errors, as the comment still said and inline the other local uses of the other helper methods. `has_invalid_syntax` is still used in the LSP, where we don't have access to the parsed module.

like the benchmarks, I think the important check here is just parse errors, not necessarily the other types of syntax errors that `has_syntax_errors` was checking

MichaReiser

Nice. I think the perf regressions aren't unreasonable and it simplifies the implementation quiet a lot

ntBre · 2025-08-12T19:28:30Z

internal might not be the right label here, but I don't really want it in the release notes. I certainly hope it doesn't cause any user-noticeable slowdown.

* main: Don't cache files with diagnostics (#19869) [ty] support recursive type aliases (#19805) [ty] Remove unsafe `salsa::Update` implementations in `tuple.rs` (#19880) [ty] Function argument inlay hints (#19269) [ty] Remove Salsa interning for `TypedDictType` (#19879) Fix `lint.future-annotations` link (#19876) [ty] Reduce memory usage of `TupleSpec` and `TupleType` (#19872) [ty] Track heap usage of salsa structs (#19790) Update salsa to pull in tracked struct changes (#19843) [ty] simplify CycleDetector::visit signature (#19873) [ty] use interior mutability in type visitors (#19871) [ty] Fix tool name is None when no ty path is given in ty_benchmark (#19870) [ty] Remove `Type::Tuple` (#19669) [ty] Short circuit `ReachabilityConstraints::analyze_single` for dynamic types (#19867)

…aints * dcreager/inferrable: (65 commits) this was right after all mark typevars inferrable as we go fix tests fix inference of constrained typevars [ty] Add precise inference for indexing, slicing and unpacking `NamedTuple` instances (#19560) Add rule code to GitLab description (#19896) [ty] render docstrings in hover (#19882) [ty] simplify return type of place_from_declarations (#19884) [ty] Various minor cleanups to tuple internals (#19891) [ty] Improve `sys.version_info` special casing (#19894) Don't cache files with diagnostics (#19869) [ty] support recursive type aliases (#19805) [ty] Remove unsafe `salsa::Update` implementations in `tuple.rs` (#19880) [ty] Function argument inlay hints (#19269) [ty] Remove Salsa interning for `TypedDictType` (#19879) Fix `lint.future-annotations` link (#19876) [ty] Reduce memory usage of `TupleSpec` and `TupleType` (#19872) [ty] Track heap usage of salsa structs (#19790) Update salsa to pull in tracked struct changes (#19843) [ty] simplify CycleDetector::visit signature (#19873) ...

trim Rule derives

45f7539

ntBre marked this pull request as ready for review August 11, 2025 17:45

ntBre requested a review from MichaReiser August 11, 2025 17:45

MichaReiser reviewed Aug 11, 2025

View reviewed changes

ntBre added 3 commits August 11, 2025 14:40

use a linted flag instead of storing LintCacheData

535b926

pass caching tests by suppressing empty index in lint_path

e2bc85e

same change in lint_stdin

84defdf

but drop part of the comment since these diagnostics aren't cached anyway

MichaReiser reviewed Aug 12, 2025

View reviewed changes

ntBre added 7 commits August 12, 2025 09:44

errors -> paths_with_diagnostics

89d7be8

use linted bool instead of Option<Diagnostics>

150424b

unset the linted flag in the cache if a file later gets diagnostics

dc9d97c

move empty index filtering into affected test

13d7cf9

add docs on Diagnostics

84b83dd

update fuzzer checks

6f42c0d

like the benchmarks, I think the important check here is just parse errors, not necessarily the other types of syntax errors that `has_syntax_errors` was checking

MichaReiser approved these changes Aug 12, 2025

View reviewed changes

ntBre added the internal An internal refactor or improvement label Aug 12, 2025

ntBre merged commit 79c949f into main Aug 12, 2025
36 checks passed

ntBre deleted the brent/dont-cache-lint-diagnostics branch August 12, 2025 19:28

ntBre mentioned this pull request Aug 11, 2025

Use new Diagnostic functionality for Ruff rules #19690

Closed

7 tasks

		/// Convert the file cache into an empty `Diagnostics`, if it has been linted.
		pub(crate) fn to_diagnostics(&self) -> Option<Diagnostics> {

	// We don't cache errors.
	// We don't cache files with diagnostics.

	// Assert that file contains no parse errors
	assert!(!result.has_syntax_errors());

	if result.has_invalid_syntax() {
	// If there's a syntax error, then there won't be any fixes to apply.
	return Ok(Fixes::default());
	}

Don't cache files with diagnostics #19869

Don't cache files with diagnostics #19869

Uh oh!

Conversation

ntBre commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Benchmarks

Uh oh!

github-actions bot commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Uh oh!

MichaReiser commented Aug 11, 2025

Uh oh!

MichaReiser Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ntBre commented Aug 11, 2025

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

ntBre commented Aug 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ntBre commented Aug 11, 2025 •

edited

Loading

github-actions bot commented Aug 11, 2025 •

edited

Loading

`ruff-ecosystem` results

MichaReiser Aug 11, 2025 •

edited

Loading