Splitting up the `ruff_linter` crate for faster compile times #1820

not-my-profile · 2023-01-12T16:13:27Z

@squiddy commented on #1547 with:

Slightly related: Any thoughts on splitting up the code into multiple crates? From my limited understanding this is one way to go for faster compiles. Multiple crates can be compiled in parallel. I did some profiling the other day and a release build takes quite some time because everything is one crate currently.

Crates can only be compiled in parallel if they don't depend on each other. Matklad has a nice blog post "Fast Rust Builds".

I just opened the PR #1816 to split off the CLI into a separate ruff_cli crate, since ruff_cli depends on ruff this doesn't really change the total build time (especially because ruff_cli can be compiled very quickly ... except the bin generation).

These graph were generated via cargo build --release --timings, see the timing before the split and the timing after the split.

I think the question is if/how we could further split up the ruff library. Currently the rule implementations depend on ast::Checker and ast::Checker depends on the rule implementations. I am not sure what we can do about that.

With my PR #1816 cargo build -p ruff for the first time now builds ~100 dependencies fewer than cargo build for the first time (since it doesn't build the CLI), so this might still be neat for people who just want to contribute a lint and test that it works.

The text was updated successfully, but these errors were encountered:

colin99d · 2023-01-12T16:27:39Z

I think another benefit of this could be if we want to have certain features not always included for the pip install. For example, #1628 wants to add spell checking which adds the typos dependency. Since we are adding a dependency and code that will only be used in one place, I feel like this is a great feature to keep in a separate crate.

not-my-profile · 2023-01-12T16:38:16Z

We can easily introduce an optional dependency on typos via a feature flag. This doesn't require a separate crate and I don't see any benefit to using a separate crate as opposed to a feature flag for something like that.

charliermarsh · 2023-01-20T23:48:59Z

Dev compile times are really getting to me 😅 Have they regressed at all lately? Perhaps we haven't been tracking closely enough to answer that.

MichaReiser · 2023-01-21T13:06:45Z

One benefit of separate crates is that it avoids re-compiling unchanged code because Rust skips crates where neither its dependencies nor modules have changed.

charliermarsh · 2023-02-03T21:51:30Z

As an initial proposal to tear down, would something like this work?

One crate per linter (like eradicate).
A crate for core, linter-agnostic definitions (like Violation, Diagnostic, etc.). This could include a trait to outline the Checker API, which the linter crates could depend on (but not its implementation).
A crate that depends on all of those, and includes Checker -- so it ties together the linters, and the linter-agnostic definitions.
A couple one-off crates that can exist as standalone libraries, like the current source crate that contains Generator, Locator, etc.
(Not sure what to do with Settings and the per-linter settings structs.)

cnpryer · 2023-02-04T01:36:10Z

We can easily introduce an optional dependency on typos via a feature flag. This doesn't require a separate crate and I don't see any benefit to using a separate crate as opposed to a feature flag for something like that.

This was my first thought. Has anyone started profiling some of these ideas? I don't think splitting ruff into more crates and introducing features are mutually exclusive, and it's probably a much bigger undertaking to do the former.

not-my-profile · 2023-02-04T06:07:44Z

Dev compile times are really getting to me

cargo test -p ruff is much faster than cargo test since it doesn't compile the ruff_cli. We could consider dropping ruff_cli from default-members in Cargo.toml to speed up cargo test when adding rule implementations ... but then cargo run wouldn't work anymore and you'd have to use cargo run -p ruff_cli (and the benefit would only apply when you just run cargo test but not cargo run but I still think it could be worth it).

One crate per linter

I don't think we want to structure our code by linter at all. I think we rather want to group our AST rules by which AST node they target.

In general I think we should firstly improve the structure of our code within the ruff crate before thinking about how we can split it up.

Has anyone started profiling some of these ideas?

Reducing such dependencies won't do much for our dev compile time since these dependencies only need to be compiled once.

charliermarsh · 2023-02-04T13:16:28Z

I think we rather want to group our AST rules by which AST node they target.

I can think of reasons we wouldn't want to do this, but it feels unproductive to debate it right now. Most important is getting the codebase into a state at which we can start breaking it down :) which requires at least (1) changing the project structure (e.g., #2088 or similar) and (2) disentangling the mutual dependencies between rule implementations and Checker et al, plus other work I'm sure.

colin99d · 2023-02-04T13:19:42Z

Will how we build this also determine how we will let people "hook" into ruff and build their own modules into our linter, like flake8 has? Also, once you guys decide on a plan feel free to assign me some refactoring work!

MichaReiser · 2023-02-22T15:00:02Z

One thing that seems somewhat straightforward to achieve is to split out the ast, source_code, cst and docstring modules into a ruff_python_syntax crate (assuming I understood their purpose correctly and it make sense to group them in such crates).

Open questions

source_code depends on vendor: Move vendor to ruff_python_syntax or introduce new ruff_vendor crate?
stylist depends on leading_quote helper -> Move/extract
A few ast helpers depend on Checker. Move to ruff and/or implement as extenion traits on Checker.

I've probably overlooked a few dependencies but it seems less hard than others (e.g. extracting settings is.... challenging)

charliermarsh · 2023-03-02T04:54:41Z

I am starting to hack on a few of these problems in #3298, mostly to see what breaks and what problems we run into. (Not ready for review but feedback welcome, etc.)

@MichaReiser

This PR productionizes @MichaReiser's suggestion in #1820 (comment), by creating a separate crate for the `ast` module (`rust_python_ast`). This will enable us to further split up the `ruff` crate, as we'll be able to create (e.g.) separate sub-linter crates that have access to these common AST utilities. This was mostly a straightforward copy (with adjustments to module imports), as the few dependencies that _did_ require modifications were handled in #3366, #3367, and #3368.

This was referenced Jan 12, 2023

Introduce a Rust module for all lint implementations #1547

Closed

Split off ruff_cli crate from ruff library #1816

Merged

charliermarsh added the internal An internal refactor or improvement label Jan 12, 2023

charliermarsh mentioned this issue Feb 4, 2023

Rename the misnamed pylint messages #2559

Merged

charliermarsh mentioned this issue Feb 23, 2023

Move RustPython vendored and helper code into its own crate #3171

Merged

charliermarsh mentioned this issue Mar 6, 2023

Create a rust_python_ast crate #3370

Merged

zanieb mentioned this issue Oct 30, 2023

Improve CI build and test times #8288

Open

konstin changed the title ~~Splitting up the ruff crate for faster compile times~~ Splitting up the ruff_linter crate for faster compile times Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Splitting up the `ruff_linter` crate for faster compile times #1820

Splitting up the `ruff_linter` crate for faster compile times #1820

not-my-profile commented Jan 12, 2023 •

edited

Loading

colin99d commented Jan 12, 2023 •

edited

Loading

not-my-profile commented Jan 12, 2023 •

edited

Loading

charliermarsh commented Jan 20, 2023

MichaReiser commented Jan 21, 2023

charliermarsh commented Feb 3, 2023

cnpryer commented Feb 4, 2023

not-my-profile commented Feb 4, 2023 •

edited

Loading

charliermarsh commented Feb 4, 2023

colin99d commented Feb 4, 2023

MichaReiser commented Feb 22, 2023 •

edited

Loading

charliermarsh commented Mar 2, 2023

Splitting up the ruff_linter crate for faster compile times #1820

Splitting up the ruff_linter crate for faster compile times #1820

Comments

not-my-profile commented Jan 12, 2023 • edited Loading

colin99d commented Jan 12, 2023 • edited Loading

not-my-profile commented Jan 12, 2023 • edited Loading

charliermarsh commented Jan 20, 2023

MichaReiser commented Jan 21, 2023

charliermarsh commented Feb 3, 2023

cnpryer commented Feb 4, 2023

not-my-profile commented Feb 4, 2023 • edited Loading

charliermarsh commented Feb 4, 2023

colin99d commented Feb 4, 2023

MichaReiser commented Feb 22, 2023 • edited Loading

Open questions

charliermarsh commented Mar 2, 2023

Splitting up the `ruff_linter` crate for faster compile times #1820

Splitting up the `ruff_linter` crate for faster compile times #1820

not-my-profile commented Jan 12, 2023 •

edited

Loading

colin99d commented Jan 12, 2023 •

edited

Loading

not-my-profile commented Jan 12, 2023 •

edited

Loading

not-my-profile commented Feb 4, 2023 •

edited

Loading

MichaReiser commented Feb 22, 2023 •

edited

Loading