Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting up the ruff_linter crate for faster compile times #1820

Open
not-my-profile opened this issue Jan 12, 2023 · 11 comments
Open

Splitting up the ruff_linter crate for faster compile times #1820

not-my-profile opened this issue Jan 12, 2023 · 11 comments
Labels
internal An internal refactor or improvement

Comments

@not-my-profile
Copy link
Contributor

not-my-profile commented Jan 12, 2023

@squiddy commented on #1547 with:

Slightly related: Any thoughts on splitting up the code into multiple crates? From my limited understanding this is one way to go for faster compiles. Multiple crates can be compiled in parallel. I did some profiling the other day and a release build takes quite some time because everything is one crate currently.

Crates can only be compiled in parallel if they don't depend on each other. Matklad has a nice blog post "Fast Rust Builds".

I just opened the PR #1816 to split off the CLI into a separate ruff_cli crate, since ruff_cli depends on ruff this doesn't really change the total build time (especially because ruff_cli can be compiled very quickly ... except the bin generation).

image

image

These graph were generated via cargo build --release --timings, see the timing before the split and the timing after the split.

I think the question is if/how we could further split up the ruff library. Currently the rule implementations depend on ast::Checker and ast::Checker depends on the rule implementations. I am not sure what we can do about that.

With my PR #1816 cargo build -p ruff for the first time now builds ~100 dependencies fewer than cargo build for the first time (since it doesn't build the CLI), so this might still be neat for people who just want to contribute a lint and test that it works.

@colin99d
Copy link
Contributor

colin99d commented Jan 12, 2023

I think another benefit of this could be if we want to have certain features not always included for the pip install. For example, #1628 wants to add spell checking which adds the typos dependency. Since we are adding a dependency and code that will only be used in one place, I feel like this is a great feature to keep in a separate crate.

@not-my-profile
Copy link
Contributor Author

not-my-profile commented Jan 12, 2023

We can easily introduce an optional dependency on typos via a feature flag. This doesn't require a separate crate and I don't see any benefit to using a separate crate as opposed to a feature flag for something like that.

@charliermarsh charliermarsh added the internal An internal refactor or improvement label Jan 12, 2023
@charliermarsh
Copy link
Member

Dev compile times are really getting to me 😅 Have they regressed at all lately? Perhaps we haven't been tracking closely enough to answer that.

@MichaReiser
Copy link
Member

One benefit of separate crates is that it avoids re-compiling unchanged code because Rust skips crates where neither its dependencies nor modules have changed.

@charliermarsh
Copy link
Member

As an initial proposal to tear down, would something like this work?

  • One crate per linter (like eradicate).
  • A crate for core, linter-agnostic definitions (like Violation, Diagnostic, etc.). This could include a trait to outline the Checker API, which the linter crates could depend on (but not its implementation).
  • A crate that depends on all of those, and includes Checker -- so it ties together the linters, and the linter-agnostic definitions.
  • A couple one-off crates that can exist as standalone libraries, like the current source crate that contains Generator, Locator, etc.
  • (Not sure what to do with Settings and the per-linter settings structs.)

@cnpryer
Copy link
Contributor

cnpryer commented Feb 4, 2023

We can easily introduce an optional dependency on typos via a feature flag. This doesn't require a separate crate and I don't see any benefit to using a separate crate as opposed to a feature flag for something like that.

This was my first thought. Has anyone started profiling some of these ideas? I don't think splitting ruff into more crates and introducing features are mutually exclusive, and it's probably a much bigger undertaking to do the former.

@not-my-profile
Copy link
Contributor Author

not-my-profile commented Feb 4, 2023

Dev compile times are really getting to me

cargo test -p ruff is much faster than cargo test since it doesn't compile the ruff_cli. We could consider dropping ruff_cli from default-members in Cargo.toml to speed up cargo test when adding rule implementations ... but then cargo run wouldn't work anymore and you'd have to use cargo run -p ruff_cli (and the benefit would only apply when you just run cargo test but not cargo run but I still think it could be worth it).

One crate per linter

I don't think we want to structure our code by linter at all. I think we rather want to group our AST rules by which AST node they target.

In general I think we should firstly improve the structure of our code within the ruff crate before thinking about how we can split it up.

Has anyone started profiling some of these ideas?

Reducing such dependencies won't do much for our dev compile time since these dependencies only need to be compiled once.

@charliermarsh
Copy link
Member

I think we rather want to group our AST rules by which AST node they target.

I can think of reasons we wouldn't want to do this, but it feels unproductive to debate it right now. Most important is getting the codebase into a state at which we can start breaking it down :) which requires at least (1) changing the project structure (e.g., #2088 or similar) and (2) disentangling the mutual dependencies between rule implementations and Checker et al, plus other work I'm sure.

@colin99d
Copy link
Contributor

colin99d commented Feb 4, 2023

Will how we build this also determine how we will let people "hook" into ruff and build their own modules into our linter, like flake8 has? Also, once you guys decide on a plan feel free to assign me some refactoring work!

@MichaReiser
Copy link
Member

MichaReiser commented Feb 22, 2023

One thing that seems somewhat straightforward to achieve is to split out the ast, source_code, cst and docstring modules into a ruff_python_syntax crate (assuming I understood their purpose correctly and it make sense to group them in such crates).

Open questions

  • source_code depends on vendor: Move vendor to ruff_python_syntax or introduce new ruff_vendor crate?
  • stylist depends on leading_quote helper -> Move/extract
  • A few ast helpers depend on Checker. Move to ruff and/or implement as extenion traits on Checker.

I've probably overlooked a few dependencies but it seems less hard than others (e.g. extracting settings is.... challenging)

@charliermarsh
Copy link
Member

I am starting to hack on a few of these problems in #3298, mostly to see what breaks and what problems we run into. (Not ready for review but feedback welcome, etc.)

charliermarsh added a commit that referenced this issue Mar 7, 2023
This PR productionizes @MichaReiser's suggestion in #1820 (comment), by creating a separate crate for the `ast` module (`rust_python_ast`). This will enable us to further split up the `ruff` crate, as we'll be able to create (e.g.) separate sub-linter crates that have access to these common AST utilities.

This was mostly a straightforward copy (with adjustments to module imports), as the few dependencies that _did_ require modifications were handled in #3366, #3367, and #3368.
@konstin konstin changed the title Splitting up the ruff crate for faster compile times Splitting up the ruff_linter crate for faster compile times Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement
Projects
None yet
Development

No branches or pull requests

5 participants