Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downgrade hashing to SHA-1 and other optimizations #5791

Merged
merged 6 commits into from
Mar 9, 2023

Conversation

hubertp
Copy link
Contributor

@hubertp hubertp commented Mar 1, 2023

Pull Request Description

This change downgrades hashing algorithm used in caching IR and library bindings to SHA-1. It is sufficient and significantly faster for the purpose of simple checksum we use it for.

Additionally, don't calculate the digest for serialized bytes - if we get the expected object type then we are confident about the integrity.

Don't initialize Jackson's ObjectMapper for every metadata serialization/de-serialization. Initialization is very costly.

Avoid unnecessary conversions between Scala and Java. Those back-and-forth asScala and asJava are pretty expensive.

Finally fix an SBT warning when generating library cache.

Closes #5763

Important Notes

The change cuts roughly 0.8-1s from the overall startup.
This change will certainly lead to invalidation of existing caches. It is advised to simply start with a clean slate.

Checklist

Please include the following checklist in your PR:

  • The documentation has been updated if necessary.
  • All code conforms to the
    Scala,
    Java,
    and
    Rust
    style guides.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed: Enso GUI was tested when built using BOTH
      ./run ide build and ./run ide watch.

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there should be a single "digest" per each library written to a file in library root. When it exists, all the caches associated with the library shall include that "digest" or otherwise be considered invalid. Everyone who modifies the library shall either remove or recompute the "digest" in the library file.

@@ -2,7 +2,7 @@

import com.oracle.truffle.api.TruffleFile;
import com.oracle.truffle.api.TruffleLogger;
import org.bouncycastle.jcajce.provider.digest.SHA3;
import org.bouncycastle.jcajce.provider.digest.SHA1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use bouncycastle and not the standard JVM implementation of the digest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the only other place where we use SHA3 is for Buffers. Those could also be downgraded, I believe.

This change downgrades hashing algorithm used in caching IR and library
bindings to SHA-1. It is sufficient and significantly faster for the
purpose of simple checksum we use it for.

Additionally, don't calculate the digest for serialized bytes - if we
get the expected object type then we are confident about the integrity.

Don't initiallize Jackson's ObjectMapper for every metadata
serialization/de-serialization. Initialization is very costly.

Avoid unnecessary conversions between Scala and Java. Those
back-and-forth `asScala` and `asJava` are pretty expensive.

Finally fix a warning when generating library cache.
@hubertp hubertp force-pushed the wip/hubert/5763-hashing-downgrade branch from 492bdf0 to 435b11c Compare March 8, 2023 14:43
@hubertp hubertp added the CI: Ready to merge This PR is eligible for automatic merge label Mar 8, 2023
@mergify mergify bot merged commit 6769ab0 into develop Mar 9, 2023
@mergify mergify bot deleted the wip/hubert/5763-hashing-downgrade branch March 9, 2023 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Downgrade hashing from SHA-3 to SHA-1 and eliminate unnecessary unnecessary digest
3 participants