Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make archives deterministic #292

Merged
merged 3 commits into from
Jul 30, 2024
Merged

Make archives deterministic #292

merged 3 commits into from
Jul 30, 2024

Conversation

lencioni
Copy link
Contributor

I noticed that the Jest test that verifies the stability of these hashes
was flaking and determined that the output was, in fact, not
determinstic.

I beleve there are two possibile sources of nondeterminism in these
zips:

  1. The order of files being globbed may be nondeterministic
  2. Concurrency in archiver's stat queue causes nondeterminism

In the end, I believe most of the problem is likely caused by 2, but I
figured that we might as well address the theoretical glob order at the
same time.

More info:

archiverjs/node-archiver#383 (comment)

We've started to use corepack in some places, and since this repo is
still using yarn v1, I want to specify the packageManager field so we
can continue without causing disruption.

While I'm at it I am also adding some entries to the .gitignore and
.npmignore files, similar to this:

happo/happo-plugin-storybook#124
I am investigating a Jest test that verifies createStaticPackage creates
a deterministic hash when the file contents do not change, and it seems
to be flaky. I suspect that it might be using a timestamp in the hash
creation which would be a real issue we want to fix, but in the meantime
I figured we might as well update this package to the latest version.

They don't keep a useful changelog, but from what I can tell the
breaking changes only drop support for old node versions that we do not
support.

v6: archiverjs/node-archiver@5.3.2...6.0.0
v7: archiverjs/node-archiver@6.0.2...7.0.0
* @param {string} dir
* @returns {Array<string>}
*/
function getFilesRecursive(dir) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I called this "recursive", I decided to use iteration instead of recursion since I think that will be a little more efficient.

if (dirent.isDirectory()) {
// This is a directory, so we are going to add it to the list of directories to recurse into.
dirs.push(res);
files.push(res);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally didn't include this line, but we have an integration test that asserts on the full contents of the archive, and the directories were previously present as distinct entries. Adding this line fixed that test. I'm not sure how important this is though, but I wanted to preserve existing functionality to be safe.

const archive = new Archiver('zip', {
// Concurrency in the stat queue leads to non-deterministic output.
// https://github.com/archiverjs/node-archiver/issues/383#issuecomment-2253139948
statConcurrency: 1,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this is the most important part of the change. The rest of the code changes here address possible ordering inconsistencies in the globbing, which at this point are only theoretical.

for (const file of tmpdirFiles) {
archive.file(file, {
name: file.slice(tmpdir.length + 1),
prefix: '',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The false argument is converted to this under the hood, so I believe this is the equivalent. https://github.com/archiverjs/node-archiver/blob/0830dea0b3798d14d33b454005628958f4611586/lib/core.js#L618-L619

I noticed that the Jest test that verifies the stability of these hashes
was flaking and determined that the output was, in fact, not
determinstic.

I beleve there are two possibile sources of nondeterminism in these
zips:

1. The order of files being globbed may be nondeterministic
2. Concurrency in archiver's stat queue causes nondeterminism

In the end, I believe most of the problem is likely caused by 2, but I
figured that we might as well address the theoretical glob order at the
same time.

More info:

archiverjs/node-archiver#383 (comment)
@lencioni lencioni changed the title Make static package archives deterministic Make archives deterministic Jul 26, 2024
@@ -13,7 +13,11 @@ const FILE_CREATION_DATE = new Date('Fri Feb 08 2019 13:31:55 GMT+0100 (CET)');

function makePackage({ paths, publicFolders }) {
return new Promise((resolve, reject) => {
const archive = new Archiver('zip');
const archive = new Archiver('zip', {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a couple more places we use Archiver, and added statConcurrency 1 to them, which should help more things be deterministic. We should probably add tests for this if we don't have any.

Also, I did not bother to do the glob stuff here. We could leave it as is, or remove the glob stuff I did elsewhere, or do the same glob stuff everywhere. @trotzig what is your preference?

@lencioni lencioni merged commit e41c6c0 into master Jul 30, 2024
4 checks passed
@lencioni lencioni deleted the archiver branch July 30, 2024 12:11
lencioni added a commit to happo/happo-e2e that referenced this pull request Aug 15, 2024
This is similar to:

happo/happo.io#292

I believe that we want to set statConcurrency to 1 here so that the
files in the archives end up in the same order every time (assuming the
glob itself is deterministically ordered).

I am updating happo.io at the same time since it brings the same new
version of archiver as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants