Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[move-analyzer] Cache user program #20621

Merged
merged 7 commits into from
Dec 18, 2024
Merged

[move-analyzer] Cache user program #20621

merged 7 commits into from
Dec 18, 2024

Conversation

awelc
Copy link
Contributor

@awelc awelc commented Dec 13, 2024

Description

This PR builds upon the compiler changes supporting partial compilation of of user code (skipping function bodies when compiling specific files designated by the user) to deliver the performance improvements described in the PR containing the compiler changes (reducing compilation time in the IDE for the Deepbook package from over 1s to less than 200ms).

In this PR, in addition to caching dependencies, we also cache user code, fully compile only modified files, and merge result of this compilation with the cached data

Test plan

All tests must pass. Also tested manually to verify correctness and performance improvements

Copy link

vercel bot commented Dec 13, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 17, 2024 9:58pm
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Dec 17, 2024 9:58pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Dec 17, 2024 9:58pm
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Dec 17, 2024 9:58pm

/// Compiled user program
program: CompiledProgram,
/// Source files
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat unrelated, but I realized we don't need both

@@ -381,6 +381,7 @@ fn initial_symbols(
pkg_deps.clone(),
ide_files_root.clone(),
project_path.as_path(),
None,
LintLevel::None,
)?;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test with some precompiled info just for sanity checking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a full compilation to compute all symbols and this is the first and only one that's done per test suite (as files do not change). What I did, though, is to add another partial compilation right after this one and use its results to run tests, but it lengthens the time to execute the test suite (not to unreasonable level, though)

// if one of the paths for a package is not a Move file, have aonly one BTreeSet
// None entry. This will be later passed to symbolicator as Some(Vec<PathBuf>) or
// None, to indicate if incremental symbolication is possible or not.
let root_dir = root_dir_opt.unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit: possibly swap this for a let-else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Thanks!

@@ -1406,10 +1413,10 @@ impl SymbolicatorRunner {
}
};
if let Some(all_starting_paths) = all_starting_paths_opt {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really confused about the naming of the variables in this code here:

  • Why Some(root_dir) when that is a set?
  • What does all_starting_paths mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that... The first one was totally my bad when changing the original code in the the pre-requisite PR (forgot to rename this following the change of what is passed in the mutex). The second one (all_starting_paths) was introduced there as well to distinguish from starting_path as we were previously only tracking a single (modified file).

I not changed them both to simply starting_paths

// None entry. This will be later passed to symbolicator as Some(Vec<PathBuf>) or
// None, to indicate if incremental symbolication is possible or not.
let root_dir = root_dir_opt.unwrap();
if let Some(mut modified_files) = pkgs_to_analyze.remove(&root_dir) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to remove and insert vs simply using entry with the appropriate std::mem::replace?

Copy link
Contributor Author

@awelc awelc Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason other than me failing to thing Rust-way sometimes... Rewritten!

let root_dir = root_dir_opt.unwrap();
if let Some(mut modified_files) = pkgs_to_analyze.remove(&root_dir) {
// we already seen this package
if modified_files.len() != 1 || modified_files.first().is_some() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this testing for and why not just !modified_files.is_empty()?

Copy link
Contributor Author

@awelc awelc Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be !modified_files.is_empty(). The intention was to represent a situation when modified_files has only one entry and this entry is None, but since an entry cannot be None if other entries exist, checking of non-emptiness is totally fine as well. In either case, with the simplifying edit, no longer needed

pkgs_to_analyze.insert(root_dir, modified_files);
} else {
// reset to a single None entry
pkgs_to_analyze.insert(root_dir, BTreeSet::from_iter(vec![None]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this? If we have a bunch of modified move files and one non-move file, why not at least consider the move files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the second look, I overdid it as there is no way that there can be non-Move file paths in this function (though they can exist as input to get_symbols). I now simply collect the file paths

}
} // else we already seen this package with a non-Move-file path so nothing to do
} else {
// first time we see this package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this logic could be combined with the above logic if we used entry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic changed so no longer needed

return mdef.loc.file_hash() == *fhash;
}
}
false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong preference, but could write as:

file_hashes.get(fpath)
    .and_then(|fhash| match &pkg_def.def {
        P::Definition::Module(mdef) => Some((mdef, fhash)),
        _ => None,
    })
    .map(|(mdef, fhash)| mdef.loc.file_hash() == *fhash)
    .unwrap_or(false)
    ```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's more Rust-y but with extra transformation via match and map not sure if it's more readable. If you don't have a strong preference, I think I'd leave it as is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, since I decided to follow the other suggestion and rewrite the other similar function, let's rewrite this one as well for consistency

Comment on lines 1828 to 1831
if let Some(fhash) = file_hashes.get(fpath) {
return mdef.loc.file_hash() == *fhash;
}
false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if let Some(fhash) = file_hashes.get(fpath) {
return mdef.loc.file_hash() == *fhash;
}
false
file_hashes.get(fpath)
.map(|fhash| mdef.loc.file_hash() == *fhash)
.unwrap_or(false)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! An arguably even more Rust-y one would be:

file_hashes
  .get(fpath)
  .map_or(false, |fhash| mdef.loc.file_hash() == *fhash)

Copy link
Contributor

@cgswords cgswords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo the code in new for SymbolicatorRunner, which is opaque and very hard to follow. There are several bits that do unexpected things. Refactoring that massive constructor into smaller functions that do clear things would help quite a bit, and make the code that much easier to maintain going forward.

@awelc
Copy link
Contributor Author

awelc commented Dec 16, 2024

LGTM modulo the code in new for SymbolicatorRunner, which is opaque and very hard to follow. There are several bits that do unexpected things. Refactoring that massive constructor into smaller functions that do clear things would help quite a bit, and make the code that much easier to maintain going forward.

Moved aggregation of package paths into a separate function which was the part that seemed to make the most sense to refactor

@awelc awelc force-pushed the aw/ide-user-code-cache branch from 02a3129 to 40ac4e2 Compare December 17, 2024 21:56
@awelc awelc temporarily deployed to sui-typescript-aws-kms-test-env December 17, 2024 21:56 — with GitHub Actions Inactive
@awelc awelc merged commit ac03618 into main Dec 18, 2024
52 checks passed
@awelc awelc deleted the aw/ide-user-code-cache branch December 18, 2024 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants