Make lockfile version parsing more robust #20140

fmeum · 2023-11-10T15:30:41Z

Instead of using a regex, still parse the lockfile as JSON to extract the version instead of using a regex. This also makes it unnecessary to read the full lockfile into memory.

Instead of using a regex, get the lockfile version by parsing the lockfile as arbitrary JSON and then later fit it to the expected schema.

meteorcloudy

Thanks! Overall looks good! But @SalmaSamy should take a look to confirm.

SalmaSamy · 2023-11-28T14:36:27Z

src/main/java/com/google/devtools/build/lib/bazel/bzlmod/BazelLockFileFunction.java

+      try (var reader =
+          new BufferedReader(
+              new InputStreamReader(lockfilePath.asPath().getInputStream(), UTF_8))) {
+        BazelLockFileValue bazelLockFileValue = gson.fromJson(reader, BazelLockFileValue.class);


We did consider this but we wanted to avoid parsing the whole file if the version is already old. Which I think is better.

Could you explain what worries you about parsing the whole file if the version is old? If it is about performance, I would find optimizing for the common case of a matching version more natural.

But then again, I don't know whether lockfiles will get so large that reading them into memory instead of streaming them makes a measurable difference even for large monorepos, so it's possible that there isn't much impact either way.

Performance & why parse the whole file if we might not use all these info?

If the version is old we will probably get a parsing exception then parse it again to get the version. So I think just getting the version first and parse it once is better.

Yes, this change definitely does more work in the case where the version doesn't match. But avoiding that work, as is done at HEAD, also comes at a cost: The lockfile is fully read into memory (instead of streaming it into the JSON parser) and searched for the regex every time, which is overhead that would be avoided with this change.

I would say it's a tradeoff and lean towards optimizing the common case, but it's quite possible that it won't make much of a difference. I can try to benchmark this, but lack access to a realistic very large MODULE.bazel file.

to fully lay it out:

HEAD: always stores whole lockfile as a string in memory, in addition to the parsed lockfile structures; parses JSON once in happy path, 0 times otherwise

this PR: only stores the parsed lockfile structures; parses JSON once in happy path, twice otherwise

I wonder if we can try to "cheat" a bit and take the best of both worlds: always read the first 5 lines (or 100 characters or whatever) of the lockfile to look for the version. If happy, parse JSON by streaming, otherwise directly barf. This avoids both the memory issue and the double-parsing, and the downside of "we might have a lockfile whose version is specified lower down in the file" is negligible IMO.

SalmaSamy

I don't think this is needed, check my comment.

fmeum force-pushed the lockfile-version branch 5 times, most recently from e15df8c to 6adbd18 Compare November 11, 2023 13:39

fmeum marked this pull request as ready for review November 25, 2023 09:03

fmeum requested review from Wyverald and meteorcloudy as code owners November 25, 2023 09:03

github-actions bot added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. awaiting-review PR is awaiting review from an assigned reviewer labels Nov 25, 2023

Make lockfile version parsing more robust

4d45d39

Instead of using a regex, get the lockfile version by parsing the lockfile as arbitrary JSON and then later fit it to the expected schema.

fmeum force-pushed the lockfile-version branch from 49ffe07 to 4d45d39 Compare November 25, 2023 09:06

meteorcloudy requested a review from SalmaSamy November 27, 2023 09:30

meteorcloudy approved these changes Nov 27, 2023

View reviewed changes

SalmaSamy reviewed Nov 28, 2023

View reviewed changes

meteorcloudy added awaiting-user-response Awaiting a response from the author and removed awaiting-review PR is awaiting review from an assigned reviewer labels Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make lockfile version parsing more robust #20140

Make lockfile version parsing more robust #20140

fmeum commented Nov 10, 2023 •

edited

Loading

meteorcloudy left a comment

SalmaSamy Nov 28, 2023

fmeum Nov 28, 2023

SalmaSamy Nov 28, 2023

fmeum Nov 28, 2023

Wyverald Dec 5, 2023

SalmaSamy left a comment

Make lockfile version parsing more robust #20140

Are you sure you want to change the base?

Make lockfile version parsing more robust #20140

Conversation

fmeum commented Nov 10, 2023 • edited Loading

meteorcloudy left a comment

Choose a reason for hiding this comment

SalmaSamy Nov 28, 2023

Choose a reason for hiding this comment

fmeum Nov 28, 2023

Choose a reason for hiding this comment

SalmaSamy Nov 28, 2023

Choose a reason for hiding this comment

fmeum Nov 28, 2023

Choose a reason for hiding this comment

Wyverald Dec 5, 2023

Choose a reason for hiding this comment

SalmaSamy left a comment

Choose a reason for hiding this comment

fmeum commented Nov 10, 2023 •

edited

Loading