Skip to content

Conversation

@ntBre
Copy link
Contributor

@ntBre ntBre commented Aug 11, 2025

Summary

To take advantage of the new diagnostics, we need to update our caching model to include all of the information supported by ruff_db's diagnostic type. Instead of trying to serialize all of this information, Micha suggested simply not caching files with diagnostics, like we already do for files with syntax errors. This PR is an attempt at that approach.

This has the added benefit of trimming down our Rule derives since this was the last place the FromStr/strum_macros::EnumString implementation was used, as well as the (de)serialization macros and CacheKey.

Test Plan

Existing tests, with their input updated not to include a diagnostic, plus a new test showing that files with lint diagnostics are not cached.

Benchmarks

In addition to tests, we wanted to check that this doesn't degrade performance too much. I posted part of this new analysis in #18198 (comment), but I'll duplicate it here. In short, there's not much difference between main and this branch for projects with few diagnostics (home-assistant, airflow), as expected. The difference for projects with many diagnostics (cpython) is quite a bit bigger (~300 ms vs ~220 ms), but most projects that run ruff regularly are likely to have very few diagnostics, so this may not be a problem practically.

I guess GitHub isn't really rendering this as I intended, but the extra separator line is meant to separate the benchmarks on main (above the line) from this branch (below the line).

Command Mean [ms] Min [ms] Max [ms]
ruff check cpython --no-cache --isolated --exit-zero 322.0 317.5 326.2
ruff check cpython --isolated --exit-zero 217.3 209.8 237.9
ruff check home-assistant --no-cache --isolated --exit-zero 279.5 277.0 283.6
ruff check home-assistant --isolated --exit-zero 37.2 35.7 40.6
ruff check airflow --no-cache --isolated --exit-zero 133.1 130.4 146.4
ruff check airflow --isolated --exit-zero 34.7 32.9 41.6
:-------------------------------------------------------------- ----------: ---------: ---------:
ruff check cpython --no-cache --isolated --exit-zero 330.1 324.5 333.6
ruff check cpython --isolated --exit-zero 309.2 306.1 314.7
ruff check home-assistant --no-cache --isolated --exit-zero 288.6 279.4 302.3
ruff check home-assistant --isolated --exit-zero 39.8 36.9 42.4
ruff check airflow --no-cache --isolated --exit-zero 134.5 131.3 140.6
ruff check airflow --isolated --exit-zero 39.1 37.2 44.3

I had Claude adapt one of the scripts from the hyperfine repo to make this plot, so it's not quite perfect, but maybe it's still useful. The table is probably more reliable for close comparisons. I'll put more details about the benchmarks below for the sake of future reproducibility.

image
Benchmark details

The versions of each project:

  • CPython: 6322edd260e8cad4b09636e05ddfb794a96a0451, the 3.10 branch from the contributing docs
  • home-assistant: 5585376b406f099fb29a970b160877b57e5efcb0
  • airflow: 29a1cb0cfde9d99b1774571688ed86cb60123896

The last two are just the main branches at the time I cloned the repos.

I don't think our Ruff config should be applied since I used --isolated, but these are cloned into my copy of Ruff at crates/ruff_linter/resources/test, and I trimmed the ./target/release/ prefix from each of the commands, but these are builds of Ruff in release mode.

And here's the script with the hyperfine invocation:

#!/bin/bash

cargo build --release --bin ruff

# git clone --depth 1 https://github.com/home-assistant/core crates/ruff_linter/resources/test/home-assistant
# git clone --depth 1 https://github.com/apache/airflow crates/ruff_linter/resources/test/airflow

bin=./target/release/ruff
resources=./crates/ruff_linter/resources/test
cpython=$resources/cpython
home_assistant=$resources/home-assistant
airflow=$resources/airflow

base=${1:-bench}

hyperfine --warmup 10 --export-json $base.json --export-markdown $base.md \
		  "$bin check $cpython --no-cache --isolated --exit-zero" \
		  "$bin check $cpython --isolated --exit-zero" \
		  "$bin check $home_assistant --no-cache --isolated --exit-zero" \
		  "$bin check $home_assistant --isolated --exit-zero" \
		  "$bin check $airflow --no-cache --isolated --exit-zero" \
		  "$bin check $airflow --isolated --exit-zero"

I ran this once on main (baseline in the graph, top half of the table) and once on this branch (nocache and bottom of the table).

Summary
--

To take advantage of the new diagnostics, we need to update our caching model to
include all of the information supported by `ruff_db`'s diagnostic type. Instead
of trying to serialize all of this information, Micha suggested simply not
caching files with diagnostics, like we already do for files with syntax errors.
This PR is a prototype of that approach.

Test Plan
--

Existing tests, with their input updated not to include a diagnostic, plus a new
test showing that files with lint diagnostics are not cached.

Benchmarks
--

In addition to tests, we wanted to check that this doesn't degrade performance
too much. I posted part of this new analysis in
#18198 (comment), but I'll
duplicate it here. In short, there's not much difference between `main` and this
branch for projects with few diagnostics (`home-assistant`, `airflow`), as
expected. And the difference for projects with many diagnostics (`cpython`)
isn't that big either. As discussed in #18198, printing the large number of
diagnostics starts to dominate the CPython timing anyway.

| Command                                                       | Mean [ms] | Min [ms] | Max [ms] |
|:--------------------------------------------------------------|----------:|---------:|---------:|
| `ruff check cpython --no-cache --isolated --exit-zero`        |     322.0 |    317.5 |    326.2 |
| `ruff check cpython --isolated --exit-zero`                   |     217.3 |    209.8 |    237.9 |
| `ruff check home-assistant --no-cache --isolated --exit-zero` |     279.5 |    277.0 |    283.6 |
| `ruff check home-assistant --isolated --exit-zero`            |      37.2 |     35.7 |     40.6 |
| `ruff check airflow --no-cache --isolated --exit-zero`        |     133.1 |    130.4 |    146.4 |
| `ruff check airflow --isolated --exit-zero`                   |      34.7 |     32.9 |     41.6 |
|:--------------------------------------------------------------|----------:|---------:|---------:|
| `ruff check cpython --no-cache --isolated --exit-zero`        |     330.1 |    324.5 |    333.6 |
| `ruff check cpython --isolated --exit-zero`                   |     309.2 |    306.1 |    314.7 |
| `ruff check home-assistant --no-cache --isolated --exit-zero` |     288.6 |    279.4 |    302.3 |
| `ruff check home-assistant --isolated --exit-zero`            |      39.8 |     36.9 |     42.4 |
| `ruff check airflow --no-cache --isolated --exit-zero`        |     134.5 |    131.3 |    140.6 |
| `ruff check airflow --isolated --exit-zero`                   |      39.1 |     37.2 |     44.3 |

I had Claude adapt one of the scripts from the hyperfine repo to make this plot,
so it's not quite perfect, but maybe it's still useful. The table is probably
more reliable for close comparisons. I'll put more details about the benchmarks
below for the sake of future reproducibility.

<plot>

<details>
@github-actions
Copy link
Contributor

github-actions bot commented Aug 11, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@ntBre ntBre marked this pull request as ready for review August 11, 2025 17:45
@ntBre ntBre requested a review from MichaReiser August 11, 2025 17:45
@MichaReiser
Copy link
Member

I don't mind the regression for CPython too much. It has a vast amount of diagnostics which should be uncommon. What I'm more surprised by is the regression for homeassitant and airflow. Do they have any diagnostics, is it just noise, or do you know where the regression is comming from?

pub(super) source: String,
/// Notebook index if this file is a Jupyter Notebook.
#[bincode(with_serde)]
pub(super) notebook_index: Option<NotebookIndex>,
Copy link
Member

@MichaReiser MichaReiser Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's preventing us from removing the Notebook index too? (in which case FileCacheData only needs to store whether a file had any violations)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah of course, thanks. It kind of looked used, but obviously we won't actually look up a notebook in this map if the file has no diagnostics. I replaced LintCacheData with a linted field to mirror the formatted field.

I also updated lint_path (and lint_stdin) to avoid constructing a NotebookIndex that wouldn't be cached (e2bc85e). Otherwise the same_results cache tests were failing with differences in the NotebookIndex field. At first I just fixed this in the test, but it seemed like a small optimization in the real code anyway.

@ntBre
Copy link
Contributor Author

ntBre commented Aug 11, 2025

I don't mind the regression for CPython too much. It has a vast amount of diagnostics which should be uncommon. What I'm more surprised by is the regression for homeassitant and airflow. Do they have any diagnostics, is it just noise, or do you know where the regression is comming from?

I think it's mostly noise (you can see CPython without a cache was 8 ms slower in the second run), but they both have some diagnostics too. home-assistant has 21 diagnostics, and airflow has 143, compared to 15,577 in cpython.

I'll take another look at the notebook index too.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some more simplification that can be done here.

Comment on lines 330 to 331
/// Convert the file cache into an empty `Diagnostics`, if it has been linted.
pub(crate) fn to_diagnostics(&self) -> Option<Diagnostics> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit overkill, given that it always returns an empty Diagnostics or None. I think I'd change this to a bool which should also allow simplifying the call site, e.g. the following can be removed entirely:

// `FixMode::Generate` and `FixMode::Diff` rely on side-effects (writing to disk,
// and writing the diff to stdout, respectively). If a file has diagnostics, we
// need to avoid reading from and writing to the cache in these modes.

if let Some((cache, relative_path, key)) = caching {
// We don't cache parsing errors.
if !has_error {
// We don't cache errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// We don't cache errors.
// We don't cache files with diagnostics.

I think this allows us to now remove result.has_syntax_errors all together. It's only used in the benchmarking code but we can just get that information from the parsed module instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current assertion in the benchmark is a bit stronger than what we get from the parsed module because it also includes semantic syntax errors, but I'm not sure that was intentional. The comment on the assertion just says "parse errors," so it's probably fine to assert only on those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need some kind of assertion on the benchmark result to make sure it doesn't get optimized out? If we assert on the parsed module, result becomes unused.

We can at least trim down the LinterResult methods to has_no_syntax_errors and has_invalid_syntax. Those are both only used in one place:

// Assert that file contains no parse errors
assert!(!result.has_syntax_errors());

(after removing the ! and using has_no_syntax_errors)

if result.has_invalid_syntax() {
// If there's a syntax error, then there won't be any fixes to apply.
return Ok(Fixes::default());
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on the assertion just says "parse errors," so it's probably fine to assert only on those.

Yeah, I think that's fine. The reason I added the assertion originally because we used to download the test files from Github and I once copied the wrong link (not the raw.github but the regular github.com link). The result was that we downloaded and benchmarked the HTML files instead of the Python files... whoops :)

That's why this assertion is there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could return LintResult and criterion will automatically wrap it in a hint::black_box? (and also drop it)

transformed.as_ipy_notebook().map(Notebook::index).cloned(),
),
);
cache.set_linted(relative_path.to_owned(), &key);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, given that this is pre-existing code. But which code path sets linted to false if a file has diagnostics (how do we clear the state?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think you're right. I guess this is different from formatted since that always gets set to true whether the file was reformatted or already formatted. I reworked this section a bit to set linted to diagnostics.is_empty() && use_fixes where use_fixes is the result of the match above this.

Comment on lines 345 to 347
// Avoid constructing a map when there are no diagnostics. The index is only used for rendering
// diagnostics anyway. This mirrors our caching behavior, which does not preserve an empty index
// either.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit subtle and it might not be evident for callers that notebook_indexes is empty if there are no diagnostics. We should document this constraint in Diagnostics or consider moving it into a constructor within Diagnostics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also just do this filtering step in the test (or adjust the assertion not to check the notebook index), if you prefer. This didn't show any difference in the benchmarks, so it's not really helpful as an optimization, especially if it's confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this in 13d7cf9 but happy to revert and document/update this code instead if you prefer it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm realizing now that we probably also need the docs since it still affects Diagnostics created from the cache...

ntBre added 7 commits August 12, 2025 09:44
in the previous commit, this check was replaced by `diagnostics.is_empty()`
because any syntax errors will already have been converted to diagnostics. here,
we revert the assertion in the benchmark to use only parse errors, as the
comment still said and inline the other local uses of the other helper methods.
`has_invalid_syntax` is still used in the LSP, where we don't have access to the
parsed module.
like the benchmarks, I think the important check here is just parse errors, not
necessarily the other types of syntax errors that `has_syntax_errors` was
checking
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I think the perf regressions aren't unreasonable and it simplifies the implementation quiet a lot

@ntBre ntBre added the internal An internal refactor or improvement label Aug 12, 2025
@ntBre
Copy link
Contributor Author

ntBre commented Aug 12, 2025

internal might not be the right label here, but I don't really want it in the release notes. I certainly hope it doesn't cause any user-noticeable slowdown.

@ntBre ntBre merged commit 79c949f into main Aug 12, 2025
36 checks passed
@ntBre ntBre deleted the brent/dont-cache-lint-diagnostics branch August 12, 2025 19:28
dcreager added a commit that referenced this pull request Aug 12, 2025
* main:
  Don't cache files with diagnostics (#19869)
  [ty] support recursive type aliases (#19805)
  [ty] Remove unsafe `salsa::Update` implementations in `tuple.rs` (#19880)
  [ty] Function argument inlay hints (#19269)
  [ty] Remove Salsa interning for `TypedDictType` (#19879)
  Fix `lint.future-annotations` link (#19876)
  [ty] Reduce memory usage of `TupleSpec` and `TupleType` (#19872)
  [ty] Track heap usage of salsa structs (#19790)
  Update salsa to pull in tracked struct changes (#19843)
  [ty] simplify CycleDetector::visit signature (#19873)
  [ty] use interior mutability in type visitors (#19871)
  [ty] Fix tool name is None when no ty path is given in ty_benchmark (#19870)
  [ty] Remove `Type::Tuple` (#19669)
  [ty] Short circuit `ReachabilityConstraints::analyze_single` for dynamic types (#19867)
dcreager added a commit that referenced this pull request Aug 13, 2025
…aints

* dcreager/inferrable: (65 commits)
  this was right after all
  mark typevars inferrable as we go
  fix tests
  fix inference of constrained typevars
  [ty] Add precise inference for indexing, slicing and unpacking `NamedTuple` instances (#19560)
  Add rule code to GitLab description (#19896)
  [ty] render docstrings in hover (#19882)
  [ty] simplify return type of place_from_declarations (#19884)
  [ty] Various minor cleanups to tuple internals (#19891)
  [ty] Improve `sys.version_info` special casing (#19894)
  Don't cache files with diagnostics (#19869)
  [ty] support recursive type aliases (#19805)
  [ty] Remove unsafe `salsa::Update` implementations in `tuple.rs` (#19880)
  [ty] Function argument inlay hints (#19269)
  [ty] Remove Salsa interning for `TypedDictType` (#19879)
  Fix `lint.future-annotations` link (#19876)
  [ty] Reduce memory usage of `TupleSpec` and `TupleType` (#19872)
  [ty] Track heap usage of salsa structs (#19790)
  Update salsa to pull in tracked struct changes (#19843)
  [ty] simplify CycleDetector::visit signature (#19873)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal An internal refactor or improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants