Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isaacs/explicit caching #75

Closed
wants to merge 3 commits into from
Closed

Isaacs/explicit caching #75

wants to merge 3 commits into from

Conversation

isaacs
Copy link
Contributor

@isaacs isaacs commented Apr 13, 2021

Fixed up and tested version of #74.

Updated to use tap 15, because I got tired of having it try to run the tests in git checkouts when I ^C out of a previous test run. (Tap 15 is smart enough to avoid folders created by t.testdir().)

@mjsir911
Copy link
Contributor

mjsir911 commented Apr 13, 2021

I'm having trouble getting the desired behavior in my bootleg development environment

root@886a9c355999:~# npm cache add --offline yn-3.1.1.tgz --cache=./npm-cache
root@886a9c355999:~# find npm-cache/
npm-cache/
npm-cache/_cacache
npm-cache/_cacache/tmp
npm-cache/_cacache/tmp/5f8bd547

but the gist of the code LGTM, digging into a bit as to why it's not working

lib/fetcher.js Outdated
)
// best-effort. cache write errors should not crash the fetch.
cstream.on('error', (er) => {})
stream.pipe(cstream)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the last piped stream needs to be returned here to have the caching actually run.

This fixes the problems I was seeing

Suggested change
stream.pipe(cstream)
stream.pipe(cstream)
return cstream

Alternatively:

Suggested change
stream.pipe(cstream)
return stream.pipe(cstream)

Although you might want to just keep reassigning stream to the latest stream eg stream = stream.pipe(istream) & stream = stream.pipe(cstream), return stream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning cstream makes all the pacote tests fail, because the cacache.put.stream writable doesn't pass through its data. Returning cstream is definitely not the right way to go.

If we pipe istream into cstream then all the data gets consumed before the caller can get access to it. So as a result, you always get an empty tarball stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I'm getting with this patch, using pacote directly via its bin:

$ node lib/bin.js tarball file:pacote-11.3.1.tgz copy.tgz --cache=$PWD/pacote-cache
{
  integrity: 'sha512-B5R+b6BX2U6exQ46KdYlHZNBsZL0RDe9TYwC6xbOgoU9JQ957m8zUxDAWwUuuCKb1JFPW+8KB+YCZRj4oBzzJQ==',
  resolved: '/Users/isaacs/dev/npm/pacote/pacote-11.3.1.tgz',
  from: 'file:pacote-11.3.1.tgz'
}

$ shasum -a 512 pacote-11.3.1.tgz
07947e6fa057d94e9ec50e3a29d6251d9341b192f44437bd4d8c02eb16ce82853d250f79ee6f335310c05b052eb8229bd4914f5bef0a07e6026518f8a01cf325  pacote-11.3.1.tgz

$ shasum -a 512 copy.tgz
07947e6fa057d94e9ec50e3a29d6251d9341b192f44437bd4d8c02eb16ce82853d250f79ee6f335310c05b052eb8229bd4914f5bef0a07e6026518f8a01cf325  copy.tgz

$ shasum -a 512 pacote-cache/content-v2/sha512/07/94/7e6fa057d94e9ec50e3a29d6251d9341b192f44437bd4d8c02eb16ce82853d250f79ee6f335310c05b052eb8229bd4914f5bef0a07e6026518f8a01cf325
07947e6fa057d94e9ec50e3a29d6251d9341b192f44437bd4d8c02eb16ce82853d250f79ee6f335310c05b052eb8229bd4914f5bef0a07e6026518f8a01cf325  pacote-cache/content-v2/sha512/07/94/7e6fa057d94e9ec50e3a29d6251d9341b192f44437bd4d8c02eb16ce82853d250f79ee6f335310c05b052eb8229bd4914f5bef0a07e6026518f8a01cf325

$ tree pacote-cache
pacote-cache
├── content-v2
│  └── sha512
│     └── 07
│        └── 94
│           └── 7e6fa057d94e9ec50e3a29d6251d9341b192f44437bd4d8c02eb16ce82853d250f79ee6f335310c05b052eb8229bd4914f5bef0a07e6026518f8a01cf325
├── index-v5
│  └── ee
│     └── 0d
│        └── 4c4d45587af9c1af774f9c59d3fc9b91b91764c81d2f3adb080948187863
└── tmp```

Copy link
Contributor Author

@isaacs isaacs Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within npm/cli:

$ npm i npm/pacote#pull/75

changed 1 package, and audited 907 packages in 3s

128 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities

$ node . cache add file:npm-7.9.0.tgz --cache=test-cache

$ tree test-cache/
test-cache
└── _cacache
   ├── content-v2
   │  └── sha512
   │     └── c7
   │        └── ed
   │           └── 81d48a1de77d860a67af21f50f9dd65f1038ec8291b3b95a6eb665ad3c175d52a989fa10da3eddf2ed694c75dd53f4b6decdf2d43a9508982d846435246c
   ├── index-v5
   │  └── af
   │     └── 03
   │        └── 5c781820370e585dc2323edbbc80669bf714da5b47d56510c7d0bd7521ee
   └── tmp
      └── b1d942f0

$ shasum -a 512 test-cache/_cacache/content-v2/sha512/c7/ed/81d48a1de77d860a67af21f50f9dd65f1038ec8291b3b95a6eb665ad3c175d52a989fa10da3eddf2ed694c75dd53f4b6decdf2d43a9508982d846435246c
c7ed81d48a1de77d860a67af21f50f9dd65f1038ec8291b3b95a6eb665ad3c175d52a989fa10da3eddf2ed694c75dd53f4b6decdf2d43a9508982d846435246c  test-cache/_cacache/content-v2/sha512/c7/ed/81d48a1de77d860a67af21f50f9dd65f1038ec8291b3b95a6eb665ad3c175d52a989fa10da3eddf2ed694c75dd53f4b6decdf2d43a9508982d846435246c

$ shasum -a 512 npm-7.9.0.tgz
801b08dda0a4334e6643b666285d16f0e317c4d89bc0a5da1a0dd28306aa0b702f307b10fb270e230e364307bdb645ff16e88ac7611ef9702bf5b1bc926ef7b8  npm-7.9.0.tgz

Seems like it works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Spoke too soon. The usage in npm/cli is breaking somehow.

lib/fetcher.js Outdated
)
// best-effort. cache write errors should not crash the fetch.
cstream.on('error', (er) => {})
stream.pipe(cstream)
Copy link
Contributor

@mjsir911 mjsir911 Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, not sure how streams work, but is istream inserted in between stream & cstream during the stream.pipe(cstream). Maybe

Suggested change
stream.pipe(cstream)
istream.pipe(cstream)

would be clearer if it does the same thing?

I'm working all of these off of my initial stream.pipe(istream).pipe(cstream)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we pipe the istream to the cstream, then the caller never gets the bytes, because they go into the cache writer and black-hole there.

We need the stream itself to pipe to both so that the bytes go to both locations, but the user can read from the istream.

@isaacs
Copy link
Contributor Author

isaacs commented Apr 14, 2021

Yeah, so, here's what's happening:

  • stream pipes to istream (the thing we want to return)
  • stream also pipes to cstream (a non-passthrough writable stream with a deferred flush() method)
  • we return istream
  • all the data passes through, and end() is called on both istream and cstream
  • cstream kicks off its asynchronous flush
  • istream ends immediately
  • The caller (npm/cli) goes "cool, it's done now" and exits the process before cstream has written its data to disk

So we need to wait for the cstream end event to be called before ending the istream.

Copy link

@darcyclarke darcyclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking so we don't accidentally land; Based on @isaacs's feedback that changes are going to be made before reviewing/landing

@isaacs isaacs force-pushed the isaacs/explicit_caching branch from ec60d33 to ca170cf Compare April 15, 2021 18:06
@isaacs
Copy link
Contributor Author

isaacs commented Apr 15, 2021

Ok, this is working in the CLI now that it's not resolving the fetch stream promise until it's fully written to cache. PTAL

@isaacs isaacs requested a review from darcyclarke April 15, 2021 18:13
Tell `fetch` not to cache it's downloaded tarballs, leave that to us

Also see npm/cli#2976 for my first attempt & some discussions as to how
we got here.

Fix: #73
Fix: npm/cli#2160
Close: npm/cli#2976

PR-URL: #74
Credit: @mjsir911
Close: #74
Reviewed-by: @isaacs

EDIT(@isaacs): Updated to add test, allow make-fetch-happen to use the
cache, pipe into cache properly, and avoid cache double-writes.
@isaacs isaacs force-pushed the isaacs/explicit_caching branch from ca170cf to 1e737c5 Compare April 15, 2021 18:15
@mjsir911
Copy link
Contributor

LGTM @isaacs and I can confirm the caching works with npm now.

Thanks for all your work & thanks for picking this up!

Copy link
Member

@wraithgar wraithgar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the help here @mjsir911

@isaacs isaacs closed this in 500a34f Apr 22, 2021
@lukekarrys lukekarrys deleted the isaacs/explicit_caching branch September 23, 2022 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants