Skip to content

[SR-12851] System-wide cache of SwiftPM dependencies #2985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Dec 3, 2020

Conversation

tgymnich
Copy link
Contributor

@tgymnich tgymnich commented Oct 17, 2020

I reworked my approach from PR #2835 to not only operate on GitRepository but on any kind of repository from within Repository Manager.

SR-12851

This PR requires the Lock Improvements in swift-tools-support-core to be merged.

@tgymnich tgymnich force-pushed the SR-12851-repository-cache-2 branch from 7bd4512 to e2c5fc3 Compare October 22, 2020 22:32
@tgymnich tgymnich marked this pull request as ready for review October 22, 2020 22:37
@tgymnich tgymnich force-pushed the SR-12851-repository-cache-2 branch from 3f5a8c6 to 40a74e5 Compare October 22, 2020 22:50
@neonichu
Copy link
Contributor

@swift-ci please smoke test

@neonichu
Copy link
Contributor

Great, thank you! I have yet to actually review the code, but did try out a SwiftPM with your diff for a bit myself and it seems to be working nicely.

In terms of speedup, I'm seeing a fresh swift package resolve of SwiftPM itself go down from ~25s to ~15s when using the cache. Copying from the cache takes up quite a bit of time here, but that's expected since we are going with a very conservative approach.

@tgymnich
Copy link
Contributor Author

tgymnich commented Oct 23, 2020

The tests are also using the local cache. We might need another change to swift-tools-support to modify SwiftPMProduct.SwiftPackage.execute(_ args: [String], packagePath: AbsolutePath? = nil, env: [String : String]? = nil)) maybe adding a --skip-cache flag by default.

I should also look into exclusively caching remote dependencies and excluding local dependencies.

try self.provider.fetch(repository: handle.repository, to: repositoryPath)
let cache = try await { self.cacheManager?.lookup(repository: handle.repository, completion: $0) }

try self.provider.fetch(repository: cache.repository, to: repositoryPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should look into whether a straight up copy might be faster than performing a fetch here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, and should also eliminate the need to reset the remote to point back to the original URL later.

If that were done in a way that used no-copy cloning on file systems that support it (e.g. APFS), that could be really fast since the bare repository typically has few but large files (typically mainly a single pack).

@neonichu
Copy link
Contributor

This is great, I have three high-level asks:

  • I think we should consider having an opt-out for the cache as a flag, it might be beneficial in some situations to not use the cache at all.
  • Could we look into making sure the "Fetching X" log statements reflect whether the fetch is coming from the cache?
  • Can we add some tests, especially around concurrent access to the cache? Anecdotally, this seems to be working fine for me locally, but we should have tests to keep that working over time.

@neonichu
Copy link
Contributor

BTW, the self-hosted tests are still using a vendored copy of TSC. This is something I am working on getting rid off, but we're not quite there, yet. So for now, you would need to update the vendored copy with the lock changes, simply by copying those files over.

@neonichu
Copy link
Contributor

One more thing, it's great that we are using the cache directory as determined by Foundation for storage, but could we also add a symlink from ~/.swiftpm/repository-cache to that directory? This makes it easier for users which may have experience with other CLI tools, especially on non-Apple platform, to find the cache directory if they need to.

@tomerd
Copy link
Contributor

tomerd commented Oct 24, 2020

I think we should consider having an opt-out for the cache as a flag, it might be beneficial in some situations to not use the cache at all.

do we already have a command for clearing the cache? I can see that being useful in case the cache gets corrupt for whatever reason

ah I see PurgeCache now

@tgymnich
Copy link
Contributor Author

BTW, the self-hosted tests are still using a vendored copy of TSC. This is something I am working on getting rid off, but we're not quite there, yet. So for now, you would need to update the vendored copy with the lock changes, simply by copying those files over.

Should I open a separate PR for that?

@tgymnich
Copy link
Contributor Author

One more thing, it's great that we are using the cache directory as determined by Foundation for storage, but could we also add a symlink from ~/.swiftpm/repository-cache to that directory? This makes it easier for users which may have experience with other CLI tools, especially on non-Apple platform, to find the cache directory if they need to.

There is no symlink function in TSC or am I missing something. If not I can open a quick PR for that.

@tgymnich
Copy link
Contributor Author

tgymnich commented Oct 26, 2020

If an error occurs due to a corrupt cache or if we cannot create the cache directory for some reason, should we try recovering from that by just performing a "normal" fetch without the cache and print a warning about the error?

@tgymnich
Copy link
Contributor Author

tgymnich commented Oct 26, 2020

  • Can we add some tests, especially around concurrent access to the cache? Anecdotally, this seems to be working fine for me locally, but we should have tests to keep that working over time.

Since we are using an approach which keeps the per-project cache, the only place I can imagine where something could go wrong would be right here:

try fileSystem.withLock(on: cachedRepositoryPath, type: .exclusive) {
   // Fetch the repository into the cache.
   try self.provider.fetch(repository: handle.repository, to: cachedRepositoryPath, update: update)
   // Copy the repository from the cache into the repository path.
   try self.provider.copy(from: cachedRepositoryPath, to: repositoryPath)
}

Since our FileLock is already unit tested I don't think it makes sense to add additional tests.

@abertelrud
Copy link
Contributor

@swift-ci please smoke test

@abertelrud
Copy link
Contributor

Thanks for the additional changes! I'll review them later today. I don't know the answers to the questions yet, but will address them when I take a look.

@tgymnich
Copy link
Contributor Author

forgot to push my last commit. Test should be fine now.

@MaxDesiatov
Copy link
Contributor

Isn't there still a conflict in Sources/SourceControl/InMemoryGitRepository.swift?

@abertelrud
Copy link
Contributor

Yes, I think InMemoryGitRepository has moved into test support now, so it looks as if there's a conflict.

Thanks @tgymnich for your continued iterations on this! To answer a few of the questions:

  • recovery from corrupt cache entries sounds great, but could as far as I am concerned by added as a follow-on PR
  • there probably should be a FileManager function to create a symlink — I'm surprised it isn't already there; I think it would make sense to add in a separate PR that this one builds on
  • to the question of TSC; I think it would make sense to have a separate PR that vendors the FileLock changes (and symlink addition if add that)

@tomerd
Copy link
Contributor

tomerd commented Oct 30, 2020

@tgymnich this looks great, looking forward to getting it merged soon

@neonichu
Copy link
Contributor

@swift-ci please smoke test

@neonichu
Copy link
Contributor

Looks like there's still an issue with the Basics module on CI, will check locally.

@neonichu
Copy link
Contributor

Ah, the issue is a missing dependency in the CMake configuration. Since SourceControl now depends on Basics, a dependency needs to be added:

% git diff Sources/SourceControl/CMakeLists.txt 
diff --git a/Sources/SourceControl/CMakeLists.txt b/Sources/SourceControl/CMakeLists.txt
index 5c45fe50..58c9c9b1 100644
--- a/Sources/SourceControl/CMakeLists.txt
+++ b/Sources/SourceControl/CMakeLists.txt
@@ -12,6 +12,7 @@ add_library(SourceControl
   RepositoryManager.swift)
 
 target_link_libraries(SourceControl PUBLIC
+  Basics
   TSCBasic
   TSCUtility)
 

@tgymnich
Copy link
Contributor Author

Done in #3070

@tomerd
Copy link
Contributor

tomerd commented Nov 20, 2020

@swift-ci please smoke tes

@tomerd
Copy link
Contributor

tomerd commented Nov 20, 2020

@swift-ci please smoke test


@available(*, deprecated)
func fetchingDidFinish(handle: RepositoryManager.RepositoryHandle, fetchDetails: RepositoryManager.FetchDetails?, error: Swift.Error?) {
fetchingDidFinish(handle: handle, error: error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you, this looks good!

Copy link
Contributor

@abertelrud abertelrud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks!

@tomerd
Copy link
Contributor

tomerd commented Nov 20, 2020

amazing work @tgymnich 🙇‍♂️ ❤️

can't wait to get this merged and use it in our day-to-day workflows 🥳

@tgymnich
Copy link
Contributor Author

You are welcome 😊. Thank you all for your reviews and helpful comments.

@tgymnich
Copy link
Contributor Author

Determining if a git repository exists by checking if ".git" exists does't work when we don't do a checkout. Now we are using git rev-parse --is-inside-git-dir --is-inside-work-tree which works in both cases.

@MaxDesiatov

This comment has been minimized.

@tgymnich
Copy link
Contributor Author

tgymnich commented Nov 25, 2020

The tests are also using the local cache. We might need another change to swift-tools-support to modify SwiftPMProduct.SwiftPackage.execute(_ args: [String], packagePath: AbsolutePath? = nil, env: [String : String]? = nil)) maybe adding a --skip-cache flag by default.

I should also look into exclusively caching remote dependencies and excluding local dependencies.

I went ahead and added some integration tests for caching in PackageToolTests.swift and some Unit tests in RepositoryManagerTests.swift. I also added a boolean flag and an environment variable to disable skipping of local packages when running tests.

@MaxDesiatov
Copy link
Contributor

@swift-ci please smoke test

@abertelrud
Copy link
Contributor

I went ahead and added some integration tests for caching in PackageToolTests.swift and some Unit tests in RepositoryManagerTests.swift. I also added a boolean flag and an environment variable to disable skipping of local packages when running tests.

Thanks for the additional changes, @tgymnich. For the controlling whether local packages are cached, it might be better to use a more general parameter than something specifically intended for unit tests embedded in the general code. If the parameter has a sensible default value, then it could be left to the unit test to look at the environment variable to decide whether to enable the behavior. Just a thought, but that would make the general code a bit cleaner.

@tgymnich
Copy link
Contributor Author

tgymnich commented Dec 2, 2020

Thanks for the additional changes, @tgymnich. For the controlling whether local packages are cached, it might be better to use a more general parameter than something specifically intended for unit tests embedded in the general code. If the parameter has a sensible default value, then it could be left to the unit test to look at the environment variable to decide whether to enable the behavior. Just a thought, but that would make the general code a bit cleaner.

I think caching of local packages makes only sense in the context of unit tests. I see only two ways of enabling this behavior from within PackageToolTests: a command line argument or an environment variable.

@abertelrud
Copy link
Contributor

I think caching of local packages makes only sense in the context of unit tests. I see only two ways of enabling this behavior from within PackageToolTests: a command line argument or an environment variable.

Right, I agree that it doesn't make sense as an external-facing feature, I was just talking about how the main code paths check for this condition. We can always refine this in a separate PR, though.

I'll take another quick look but as far as I am concerned this is ready to merge. Any objections? Thanks for all your work on this! Can't wait to use this!

@tgymnich
Copy link
Contributor Author

tgymnich commented Dec 3, 2020

👍🏼 let's refine this later. For me this is ready to be merged as well.

@abertelrud
Copy link
Contributor

Fantastic. Thanks for your work on this!

@abertelrud abertelrud merged commit a8ded58 into swiftlang:main Dec 3, 2020
@tomerd
Copy link
Contributor

tomerd commented Dec 4, 2020

this is fantastic 🥳 ❤️

@neonichu
Copy link
Contributor

neonichu commented Dec 4, 2020

This is great 🎉 Thank you @tgymnich!

federicobucchi pushed a commit to federicobucchi/swift-package-manager that referenced this pull request Jan 6, 2021
* implemented caching

* fixed packages with a path ending in “/”

* fetching a repo creates an alias for the fetched repository under a new RepositorySpecifier

* added purge-cache and --cache-path options

* temporarily fix tests until FileSystem has locks

* added documentation and some refactoring

* remove unnecessary extension

* remove unnecessary import

* use file lock provided by FileSystem

* added -skip-cache flag

* using copy instead of fetch

* using repository provider instead of repository manager to fetch the cached repo

* added fetchDetail to delegate methods

* remove func setURL(remote: String, url: String) throws

* fixed unit tests by making RepositoryProvider aware of copy

* fix newlines

* fix lock

* extract function initalizeCacheIfNeeded

* update comments

* remove duplicate code

* remove newlines

* re-add old comments and move repositoryPath back down

* move isCache to the top

* converted FetchDetails into a struct

* use FileSystem.swiftPMCacheDirectory

* add symlink to cache directory in .swiftpm

* add some comments

* create .swiftpm directory

* don't cache local packages

* added default implementations

* automatically infer update parameter for fetch

* deprecated old delegate methods

* add Basics to SourceControl

* use a more reliable way to determine if a git repository exists

* open repository before fetching

* added tests for repository manager

* rename cacheLocalPackages
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants