-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(git): avoid unnecessary file hashing while config files detection #6461
Conversation
@thsig @stefreak @edvald this is a naive attempt to do a cheap and simple opmitization for hash calculation in the case then there are many-many files untracked by Git are (implicitly) included in the Garden working tree. This PR just disables the hash calculation in the config files detection phase. I'm not sure if it's a good idea. It actually does not solve the problem with a large number of untracked files. It hides that problem instead. In some cases users might not want to include all these untracked files. Those files can be generated by some build jobs or another scripts. Such generated files are not always supposed to be included in the working file set of the Garden project. It might be better to want user loudly in case if a big number of untracked files is detected. Another attempt was done in #5920, as Jon suggested in the comment, we could use Any global ideas on dealing with the large number of untracked files? We can also log a separate warning if these number of untracked files is too large. But I'm not sure it that improves the UX a lot. A trial was done in #6452. |
@vvagaytsev in case the large number of untracked files are included in an action, the user will get a warning
So how do we hide a problem? If the user doesn't want those files to be included, they need to configure the action not to include them. Once we get to the point that we only calculate the hash if the file is included in the action, it doesn't matter how the user excludes the file (using the gardenignore feature or includes/excludes). In the end, we can't know if the files need to be excluded or included and so we shouldn't send the user to a potentially wrong direction. If the files are not included, they shouldn't slow down Garden. If they are included, the user should indeed be informed that it might slow down things. The existing warning can definitely be improved to be more informational, e.g. provide means for understanding which files are causing the issue exactly. |
@stefreak yes, you're right. Regarding the problem "hiding", sorry for the confusion, let me elaborate that a bit. I meant that the changes made in this PR can "hide" a bigger problem. This PR is not the proper solution, but just a quick-fix to reduce the processing time and make it not-too-slow for "some" (not too large) number of untracked files. Maybe "deferring a problem" would be a better wording here :) This PR covers a specific case, but does not fix the performance issue for a "large enough" number of untracked files. |
@vvagaytsev I think it might make sense to change the warning to only warn you if there is a large number of untracked files, if that's the only case that can cause a performance issue; Definitely open to think about that separately 👍 It feels to me that this PR gives a tangible performance improvement already, and while it does not solve the problem entirely it does not make it worse or hide it so I am pro merging this 👍 |
There is a test failure though, maybe avoiding the hashing in the first pass is causing a regression? |
@stefreak right, actually, we should notify users on the large number of untracked and modified files, i.e. the files we need to calculate hashes for. I'll take a closer look at the test failures. |
a2f9a8e
to
68b2c67
Compare
68b2c67
to
cb51de9
Compare
cb51de9
to
d55cc02
Compare
@stefreak the tests should be green now. Please review. I'll change the warning on the large number of untracked files in a separate PR. |
Are we now using the repo scan mode when scanning for configs? If we are, I think we may want to address that, since scanning for configs is otherwise quite efficient and shouldn't trigger any file hashing at all. |
@edvald yes, we use |
Got it. I think we should look into that, could be a performance drag when scanning for configs in many cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @jon if there is a performance benefit using the other scan mode for configs, but aren't we caching some of the scan results? If so, let's only change the scan mode if it gives a measurable improvement.
Let's address that in a separate PR, this is already an improvement and looks good to me!
Agreed, let's do that in a separate PR. But I'm quite sure that using the other scan mode to find config files will work better, at the very least in commands that don't need to resolve all or many actions, where the repo is fairly large. |
What this PR does / why we need it:
As it was mentioned in #5920, Garden Core calls
AbstractGitHandler.getFiles()
method twice:For big projects with a large amount of untracked files that can cause performance downgrade, because each calls trigger the hash calculation of the untracked files.
This PR makes a simple improvement by disabling hash-calculation in the config files detection phase.
It improves a performance only in a specific scenario, when too many files untracked by Git are included into the Garden.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
The performance difference can be compared by running
validate -l3
Garden command in the root of https://github.com/garden-io/garden repo with modified https://github.com/garden-io/garden/blob/main/.gardenignore file. Deletenode_modules
from.gardenignore
before running.Before
After