Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

ipfs.cat calls https://node{0..3}.preload.ipfs.io/api/v0/refs?r=true&arg=<hash> on every cat #3307

Closed
georgyo opened this issue Sep 29, 2020 · 2 comments · Fixed by #3363
Closed
Labels
kind/bug A bug in existing code (including security flaws) P0 Critical: Tackled by core team ASAP status/in-progress In progress

Comments

@georgyo
Copy link
Contributor

georgyo commented Sep 29, 2020

  • Version:
{version: "0.50.1", repo: 9, commit: "", interface-ipfs-core: "^0.140.0", ipfs-http-client: "^47.0.0"}
  • Platform:
    Verified in Firefox 81 and Chrome 85 on Windows, Linux, and Macos

  • Subsystem:
    Unknown

Severity:

Medium

Description:

If there are many repeat calls of ipfs.cat on subpaths of a IPFS hash, then each cat will call refs on the root of the hash. The calls are identical, and they don't seem to end up the browser's indexdb, or at least they aren't referenced there again.

On large trees, you could end up downloading many gigabytes of the same refs exact refs data.

Steps to reproduce the error:

I discovered this while sending over improvements to moshisushi/hlsjs-ipfs-loader

If you look at the network requests made on https://charade.fu.io you will see hundreds of requests to the following 4 URLS.

https://node0.preload.ipfs.io/api/v0/refs?r=true&arg=QmbdmJ2JRvEFhWWzHKrAcjjBdkcs46F2N7ggZnrdKKAu4s
https://node1.preload.ipfs.io/api/v0/refs?r=true&arg=QmbdmJ2JRvEFhWWzHKrAcjjBdkcs46F2N7ggZnrdKKAu4s
https://node2.preload.ipfs.io/api/v0/refs?r=true&arg=QmbdmJ2JRvEFhWWzHKrAcjjBdkcs46F2N7ggZnrdKKAu4s
https://node3.preload.ipfs.io/api/v0/refs?r=true&arg=QmbdmJ2JRvEFhWWzHKrAcjjBdkcs46F2N7ggZnrdKKAu4s

There are 1360 parts times 5 different bit rates, and each part cat also triggers ipfs-js to call that refs api for the entire tree, not just for that part. The refs here are 2.5MB. As a result the browser is downloading more than 3.5GB of the same 2.5MB from the preload servers.

The movie Charade (1963) is in the Public Domain, I'm using it as a much longer Big Buck Bunny to enhance the issue. There is no copyright infringement there.

@georgyo georgyo added the need/triage Needs initial labeling and prioritization label Sep 29, 2020
@welcome
Copy link

welcome bot commented Sep 29, 2020

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

@jacobheun jacobheun added P0 Critical: Tackled by core team ASAP kind/bug A bug in existing code (including security flaws) labels Oct 28, 2020
achingbrain added a commit that referenced this issue Oct 30, 2020
We 'preload' most CIDs we interact with on the network. In some cases
this can mean preloading the same CID over and over again which is not
necessary.

This PR adds a LRU cache to the preloader with a default size of 1000.
The cache is used to avoid re-preloading the same CID over and over again
until it drops out of the cache.  We use a cache that will evict CIDs
over time to have some sort of upper bound on memory usage.

Fixes #3307
achingbrain added a commit that referenced this issue Oct 30, 2020
We 'preload' most CIDs we interact with on the network. In some cases
this can mean preloading the same CID over and over again which is not
necessary.

This PR adds a LRU cache to the preloader with a default size of 1000.
The cache is used to avoid re-preloading the same CID over and over again
until it drops out of the cache.  We use a cache that will evict CIDs
over time to have some sort of upper bound on memory usage.

Fixes #3307
@achingbrain
Copy link
Member

What's happening here is that hlsjs-ipfs-loader is calling ipfs.cat repeatedly for the same CID which is getting preloaded over and over again.

Preloading involves hitting the /refs endpoint of a remote go-IPFS node that your local IPFS node has a connection to, who's purpose is to cache content to make it more available.

The /refs endpoint recursively returns a list of DAGLinks in the DAGNode that corresponds to the passed CID - in invoking it, it causes the remote node to use bitswap to slurp the content up from you, which it shouldn't do if it already has it so the theory is it's a cheap operation.

In practice here the DAG behind QmbdmJ2JRvEFhWWzHKrAcjjBdkcs46F2N7ggZnrdKKAu4s is large enough for that list of DAGLinks to be very long, hence the 2.5MB transfer size.

The long term fix for this is to make js-IPFS respond to DHT queries and be dialable from the outside, even in the browser, then preloading would not be necessary, though sadly this is a non-trivial undertaking.

I've opened #3363 as a short-term fix which caches preloaded CIDs so this should only make one request to the remote /refs endpoint.

@achingbrain achingbrain added status/in-progress In progress and removed need/triage Needs initial labeling and prioritization labels Oct 30, 2020
achingbrain added a commit that referenced this issue Oct 30, 2020
We 'preload' most CIDs we interact with on the network. In some cases
this can mean preloading the same CID over and over again which is not
necessary.

This PR adds a LRU cache to the preloader with a default size of 1000.
The cache is used to avoid re-preloading the same CID over and over again
until it drops out of the cache.  We use a cache that will evict CIDs
over time to have some sort of upper bound on memory usage.

Fixes #3307

Co-authored-by: Vasco Santos <vasco.santos@moxy.studio>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug A bug in existing code (including security flaws) P0 Critical: Tackled by core team ASAP status/in-progress In progress
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants