-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Don't cache files with diagnostics #19869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary -- To take advantage of the new diagnostics, we need to update our caching model to include all of the information supported by `ruff_db`'s diagnostic type. Instead of trying to serialize all of this information, Micha suggested simply not caching files with diagnostics, like we already do for files with syntax errors. This PR is a prototype of that approach. Test Plan -- Existing tests, with their input updated not to include a diagnostic, plus a new test showing that files with lint diagnostics are not cached. Benchmarks -- In addition to tests, we wanted to check that this doesn't degrade performance too much. I posted part of this new analysis in #18198 (comment), but I'll duplicate it here. In short, there's not much difference between `main` and this branch for projects with few diagnostics (`home-assistant`, `airflow`), as expected. And the difference for projects with many diagnostics (`cpython`) isn't that big either. As discussed in #18198, printing the large number of diagnostics starts to dominate the CPython timing anyway. | Command | Mean [ms] | Min [ms] | Max [ms] | |:--------------------------------------------------------------|----------:|---------:|---------:| | `ruff check cpython --no-cache --isolated --exit-zero` | 322.0 | 317.5 | 326.2 | | `ruff check cpython --isolated --exit-zero` | 217.3 | 209.8 | 237.9 | | `ruff check home-assistant --no-cache --isolated --exit-zero` | 279.5 | 277.0 | 283.6 | | `ruff check home-assistant --isolated --exit-zero` | 37.2 | 35.7 | 40.6 | | `ruff check airflow --no-cache --isolated --exit-zero` | 133.1 | 130.4 | 146.4 | | `ruff check airflow --isolated --exit-zero` | 34.7 | 32.9 | 41.6 | |:--------------------------------------------------------------|----------:|---------:|---------:| | `ruff check cpython --no-cache --isolated --exit-zero` | 330.1 | 324.5 | 333.6 | | `ruff check cpython --isolated --exit-zero` | 309.2 | 306.1 | 314.7 | | `ruff check home-assistant --no-cache --isolated --exit-zero` | 288.6 | 279.4 | 302.3 | | `ruff check home-assistant --isolated --exit-zero` | 39.8 | 36.9 | 42.4 | | `ruff check airflow --no-cache --isolated --exit-zero` | 134.5 | 131.3 | 140.6 | | `ruff check airflow --isolated --exit-zero` | 39.1 | 37.2 | 44.3 | I had Claude adapt one of the scripts from the hyperfine repo to make this plot, so it's not quite perfect, but maybe it's still useful. The table is probably more reliable for close comparisons. I'll put more details about the benchmarks below for the sake of future reproducibility. <plot> <details>
|
|
I don't mind the regression for CPython too much. It has a vast amount of diagnostics which should be uncommon. What I'm more surprised by is the regression for homeassitant and airflow. Do they have any diagnostics, is it just noise, or do you know where the regression is comming from? |
crates/ruff/src/cache.rs
Outdated
| pub(super) source: String, | ||
| /// Notebook index if this file is a Jupyter Notebook. | ||
| #[bincode(with_serde)] | ||
| pub(super) notebook_index: Option<NotebookIndex>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's preventing us from removing the Notebook index too? (in which case FileCacheData only needs to store whether a file had any violations)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah of course, thanks. It kind of looked used, but obviously we won't actually look up a notebook in this map if the file has no diagnostics. I replaced LintCacheData with a linted field to mirror the formatted field.
I also updated lint_path (and lint_stdin) to avoid constructing a NotebookIndex that wouldn't be cached (e2bc85e). Otherwise the same_results cache tests were failing with differences in the NotebookIndex field. At first I just fixed this in the test, but it seemed like a small optimization in the real code anyway.
I think it's mostly noise (you can see CPython without a cache was 8 ms slower in the second run), but they both have some diagnostics too. I'll take another look at the notebook index too. |
but drop part of the comment since these diagnostics aren't cached anyway
MichaReiser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's some more simplification that can be done here.
crates/ruff/src/cache.rs
Outdated
| /// Convert the file cache into an empty `Diagnostics`, if it has been linted. | ||
| pub(crate) fn to_diagnostics(&self) -> Option<Diagnostics> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit overkill, given that it always returns an empty Diagnostics or None. I think I'd change this to a bool which should also allow simplifying the call site, e.g. the following can be removed entirely:
// `FixMode::Generate` and `FixMode::Diff` rely on side-effects (writing to disk,
// and writing the diff to stdout, respectively). If a file has diagnostics, we
// need to avoid reading from and writing to the cache in these modes.
crates/ruff/src/diagnostics.rs
Outdated
| if let Some((cache, relative_path, key)) = caching { | ||
| // We don't cache parsing errors. | ||
| if !has_error { | ||
| // We don't cache errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // We don't cache errors. | |
| // We don't cache files with diagnostics. |
I think this allows us to now remove result.has_syntax_errors all together. It's only used in the benchmarking code but we can just get that information from the parsed module instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current assertion in the benchmark is a bit stronger than what we get from the parsed module because it also includes semantic syntax errors, but I'm not sure that was intentional. The comment on the assertion just says "parse errors," so it's probably fine to assert only on those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need some kind of assertion on the benchmark result to make sure it doesn't get optimized out? If we assert on the parsed module, result becomes unused.
We can at least trim down the LinterResult methods to has_no_syntax_errors and has_invalid_syntax. Those are both only used in one place:
ruff/crates/ruff_benchmark/benches/linter.rs
Lines 91 to 92 in ad28b80
| // Assert that file contains no parse errors | |
| assert!(!result.has_syntax_errors()); |
(after removing the ! and using has_no_syntax_errors)
ruff/crates/ruff_server/src/fix.rs
Lines 78 to 81 in ad28b80
| if result.has_invalid_syntax() { | |
| // If there's a syntax error, then there won't be any fixes to apply. | |
| return Ok(Fixes::default()); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on the assertion just says "parse errors," so it's probably fine to assert only on those.
Yeah, I think that's fine. The reason I added the assertion originally because we used to download the test files from Github and I once copied the wrong link (not the raw.github but the regular github.com link). The result was that we downloaded and benchmarked the HTML files instead of the Python files... whoops :)
That's why this assertion is there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could return LintResult and criterion will automatically wrap it in a hint::black_box? (and also drop it)
crates/ruff/src/diagnostics.rs
Outdated
| transformed.as_ipy_notebook().map(Notebook::index).cloned(), | ||
| ), | ||
| ); | ||
| cache.set_linted(relative_path.to_owned(), &key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine, given that this is pre-existing code. But which code path sets linted to false if a file has diagnostics (how do we clear the state?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think you're right. I guess this is different from formatted since that always gets set to true whether the file was reformatted or already formatted. I reworked this section a bit to set linted to diagnostics.is_empty() && use_fixes where use_fixes is the result of the match above this.
crates/ruff/src/diagnostics.rs
Outdated
| // Avoid constructing a map when there are no diagnostics. The index is only used for rendering | ||
| // diagnostics anyway. This mirrors our caching behavior, which does not preserve an empty index | ||
| // either. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit subtle and it might not be evident for callers that notebook_indexes is empty if there are no diagnostics. We should document this constraint in Diagnostics or consider moving it into a constructor within Diagnostics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also just do this filtering step in the test (or adjust the assertion not to check the notebook index), if you prefer. This didn't show any difference in the benchmarks, so it's not really helpful as an optimization, especially if it's confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this in 13d7cf9 but happy to revert and document/update this code instead if you prefer it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm realizing now that we probably also need the docs since it still affects Diagnostics created from the cache...
in the previous commit, this check was replaced by `diagnostics.is_empty()` because any syntax errors will already have been converted to diagnostics. here, we revert the assertion in the benchmark to use only parse errors, as the comment still said and inline the other local uses of the other helper methods. `has_invalid_syntax` is still used in the LSP, where we don't have access to the parsed module.
like the benchmarks, I think the important check here is just parse errors, not necessarily the other types of syntax errors that `has_syntax_errors` was checking
MichaReiser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I think the perf regressions aren't unreasonable and it simplifies the implementation quiet a lot
|
|
* main: Don't cache files with diagnostics (#19869) [ty] support recursive type aliases (#19805) [ty] Remove unsafe `salsa::Update` implementations in `tuple.rs` (#19880) [ty] Function argument inlay hints (#19269) [ty] Remove Salsa interning for `TypedDictType` (#19879) Fix `lint.future-annotations` link (#19876) [ty] Reduce memory usage of `TupleSpec` and `TupleType` (#19872) [ty] Track heap usage of salsa structs (#19790) Update salsa to pull in tracked struct changes (#19843) [ty] simplify CycleDetector::visit signature (#19873) [ty] use interior mutability in type visitors (#19871) [ty] Fix tool name is None when no ty path is given in ty_benchmark (#19870) [ty] Remove `Type::Tuple` (#19669) [ty] Short circuit `ReachabilityConstraints::analyze_single` for dynamic types (#19867)
…aints * dcreager/inferrable: (65 commits) this was right after all mark typevars inferrable as we go fix tests fix inference of constrained typevars [ty] Add precise inference for indexing, slicing and unpacking `NamedTuple` instances (#19560) Add rule code to GitLab description (#19896) [ty] render docstrings in hover (#19882) [ty] simplify return type of place_from_declarations (#19884) [ty] Various minor cleanups to tuple internals (#19891) [ty] Improve `sys.version_info` special casing (#19894) Don't cache files with diagnostics (#19869) [ty] support recursive type aliases (#19805) [ty] Remove unsafe `salsa::Update` implementations in `tuple.rs` (#19880) [ty] Function argument inlay hints (#19269) [ty] Remove Salsa interning for `TypedDictType` (#19879) Fix `lint.future-annotations` link (#19876) [ty] Reduce memory usage of `TupleSpec` and `TupleType` (#19872) [ty] Track heap usage of salsa structs (#19790) Update salsa to pull in tracked struct changes (#19843) [ty] simplify CycleDetector::visit signature (#19873) ...
Summary
To take advantage of the new diagnostics, we need to update our caching model to include all of the information supported by
ruff_db's diagnostic type. Instead of trying to serialize all of this information, Micha suggested simply not caching files with diagnostics, like we already do for files with syntax errors. This PR is an attempt at that approach.This has the added benefit of trimming down our
Rulederives since this was the last place theFromStr/strum_macros::EnumStringimplementation was used, as well as the (de)serialization macros andCacheKey.Test Plan
Existing tests, with their input updated not to include a diagnostic, plus a new test showing that files with lint diagnostics are not cached.
Benchmarks
In addition to tests, we wanted to check that this doesn't degrade performance too much. I posted part of this new analysis in #18198 (comment), but I'll duplicate it here. In short, there's not much difference between
mainand this branch for projects with few diagnostics (home-assistant,airflow), as expected. The difference for projects with many diagnostics (cpython) is quite a bit bigger (~300 ms vs ~220 ms), but most projects that run ruff regularly are likely to have very few diagnostics, so this may not be a problem practically.I guess GitHub isn't really rendering this as I intended, but the extra separator line is meant to separate the benchmarks on
main(above the line) from this branch (below the line).ruff check cpython --no-cache --isolated --exit-zeroruff check cpython --isolated --exit-zeroruff check home-assistant --no-cache --isolated --exit-zeroruff check home-assistant --isolated --exit-zeroruff check airflow --no-cache --isolated --exit-zeroruff check airflow --isolated --exit-zeroruff check cpython --no-cache --isolated --exit-zeroruff check cpython --isolated --exit-zeroruff check home-assistant --no-cache --isolated --exit-zeroruff check home-assistant --isolated --exit-zeroruff check airflow --no-cache --isolated --exit-zeroruff check airflow --isolated --exit-zeroI had Claude adapt one of the scripts from the hyperfine repo to make this plot, so it's not quite perfect, but maybe it's still useful. The table is probably more reliable for close comparisons. I'll put more details about the benchmarks below for the sake of future reproducibility.
Benchmark details
The versions of each project:
home-assistant: 5585376b406f099fb29a970b160877b57e5efcb0airflow: 29a1cb0cfde9d99b1774571688ed86cb60123896The last two are just the main branches at the time I cloned the repos.
I don't think our Ruff config should be applied since I used
--isolated, but these are cloned into my copy of Ruff atcrates/ruff_linter/resources/test, and I trimmed the./target/release/prefix from each of the commands, but these are builds of Ruff in release mode.And here's the script with the
hyperfineinvocation:I ran this once on
main(baselinein the graph, top half of the table) and once on this branch (nocacheand bottom of the table).