-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make lockfile version parsing more robust #20140
base: master
Are you sure you want to change the base?
Conversation
e15df8c
to
6adbd18
Compare
Instead of using a regex, get the lockfile version by parsing the lockfile as arbitrary JSON and then later fit it to the expected schema.
49ffe07
to
4d45d39
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Overall looks good! But @SalmaSamy should take a look to confirm.
try (var reader = | ||
new BufferedReader( | ||
new InputStreamReader(lockfilePath.asPath().getInputStream(), UTF_8))) { | ||
BazelLockFileValue bazelLockFileValue = gson.fromJson(reader, BazelLockFileValue.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did consider this but we wanted to avoid parsing the whole file if the version is already old. Which I think is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain what worries you about parsing the whole file if the version is old? If it is about performance, I would find optimizing for the common case of a matching version more natural.
But then again, I don't know whether lockfiles will get so large that reading them into memory instead of streaming them makes a measurable difference even for large monorepos, so it's possible that there isn't much impact either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance & why parse the whole file if we might not use all these info?
- If the version is old we will probably get a parsing exception then parse it again to get the version. So I think just getting the version first and parse it once is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this change definitely does more work in the case where the version doesn't match. But avoiding that work, as is done at HEAD, also comes at a cost: The lockfile is fully read into memory (instead of streaming it into the JSON parser) and searched for the regex every time, which is overhead that would be avoided with this change.
I would say it's a tradeoff and lean towards optimizing the common case, but it's quite possible that it won't make much of a difference. I can try to benchmark this, but lack access to a realistic very large MODULE.bazel file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to fully lay it out:
- HEAD: always stores whole lockfile as a string in memory, in addition to the parsed lockfile structures; parses JSON once in happy path, 0 times otherwise
- this PR: only stores the parsed lockfile structures; parses JSON once in happy path, twice otherwise
I wonder if we can try to "cheat" a bit and take the best of both worlds: always read the first 5 lines (or 100 characters or whatever) of the lockfile to look for the version. If happy, parse JSON by streaming, otherwise directly barf. This avoids both the memory issue and the double-parsing, and the downside of "we might have a lockfile whose version is specified lower down in the file" is negligible IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed, check my comment.
Instead of using a regex, still parse the lockfile as JSON to extract the version instead of using a regex. This also makes it unnecessary to read the full lockfile into memory.