-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make archives deterministic #292
Conversation
We've started to use corepack in some places, and since this repo is still using yarn v1, I want to specify the packageManager field so we can continue without causing disruption. While I'm at it I am also adding some entries to the .gitignore and .npmignore files, similar to this: happo/happo-plugin-storybook#124
I am investigating a Jest test that verifies createStaticPackage creates a deterministic hash when the file contents do not change, and it seems to be flaky. I suspect that it might be using a timestamp in the hash creation which would be a real issue we want to fix, but in the meantime I figured we might as well update this package to the latest version. They don't keep a useful changelog, but from what I can tell the breaking changes only drop support for old node versions that we do not support. v6: archiverjs/node-archiver@5.3.2...6.0.0 v7: archiverjs/node-archiver@6.0.2...7.0.0
* @param {string} dir | ||
* @returns {Array<string>} | ||
*/ | ||
function getFilesRecursive(dir) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I called this "recursive", I decided to use iteration instead of recursion since I think that will be a little more efficient.
if (dirent.isDirectory()) { | ||
// This is a directory, so we are going to add it to the list of directories to recurse into. | ||
dirs.push(res); | ||
files.push(res); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally didn't include this line, but we have an integration test that asserts on the full contents of the archive, and the directories were previously present as distinct entries. Adding this line fixed that test. I'm not sure how important this is though, but I wanted to preserve existing functionality to be safe.
const archive = new Archiver('zip', { | ||
// Concurrency in the stat queue leads to non-deterministic output. | ||
// https://github.com/archiverjs/node-archiver/issues/383#issuecomment-2253139948 | ||
statConcurrency: 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, this is the most important part of the change. The rest of the code changes here address possible ordering inconsistencies in the globbing, which at this point are only theoretical.
for (const file of tmpdirFiles) { | ||
archive.file(file, { | ||
name: file.slice(tmpdir.length + 1), | ||
prefix: '', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The false
argument is converted to this under the hood, so I believe this is the equivalent. https://github.com/archiverjs/node-archiver/blob/0830dea0b3798d14d33b454005628958f4611586/lib/core.js#L618-L619
I noticed that the Jest test that verifies the stability of these hashes was flaking and determined that the output was, in fact, not determinstic. I beleve there are two possibile sources of nondeterminism in these zips: 1. The order of files being globbed may be nondeterministic 2. Concurrency in archiver's stat queue causes nondeterminism In the end, I believe most of the problem is likely caused by 2, but I figured that we might as well address the theoretical glob order at the same time. More info: archiverjs/node-archiver#383 (comment)
@@ -13,7 +13,11 @@ const FILE_CREATION_DATE = new Date('Fri Feb 08 2019 13:31:55 GMT+0100 (CET)'); | |||
|
|||
function makePackage({ paths, publicFolders }) { | |||
return new Promise((resolve, reject) => { | |||
const archive = new Archiver('zip'); | |||
const archive = new Archiver('zip', { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a couple more places we use Archiver, and added statConcurrency 1 to them, which should help more things be deterministic. We should probably add tests for this if we don't have any.
Also, I did not bother to do the glob stuff here. We could leave it as is, or remove the glob stuff I did elsewhere, or do the same glob stuff everywhere. @trotzig what is your preference?
This is similar to: happo/happo.io#292 I believe that we want to set statConcurrency to 1 here so that the files in the archives end up in the same order every time (assuming the glob itself is deterministically ordered). I am updating happo.io at the same time since it brings the same new version of archiver as well.
I noticed that the Jest test that verifies the stability of these hashes
was flaking and determined that the output was, in fact, not
determinstic.
I beleve there are two possibile sources of nondeterminism in these
zips:
In the end, I believe most of the problem is likely caused by 2, but I
figured that we might as well address the theoretical glob order at the
same time.
More info:
archiverjs/node-archiver#383 (comment)