Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreachable Providers (client): Decreased Re-Provide Delay/Record TTL #9984

Open
6 tasks done
dennis-tra opened this issue Jun 21, 2023 · 1 comment
Open
6 tasks done
Labels
effort/weeks Estimated to take multiple weeks exp/expert Having worked on the specific codebase is important kind/feature A new feature P2 Medium: Good to have, but can wait until someone steps up topic/config Topic config topic/design-ux UX strategy, research, not solely visual design topic/provider Topic provider

Comments

@dennis-tra
Copy link
Contributor

dennis-tra commented Jun 21, 2023

Checklist

  • My issue is specific & actionable.
  • I am not suggesting a protocol enhancement.
  • I have searched on the issue tracker for my issue.

Description

Checklist

  • My issue is specific & actionable.
  • I am not suggesting a protocol enhancement.
  • I have searched on the issue tracker for my issue.

Description

This is the corresponding issue for a client side change to mitigate the impact of large numbers of unreachable providers.

Context

The context is explained in #9982.

Proposal

We identified two ways forward to address the impact of unreachable providers.

  1. A prioritization logic of provider records on the server side. The peers that serve provider records sort them in such a way that, e.g., the first one in the list likely contains a peer that is actually reachable.
  2. A delayed provider record publication. E.g. only announce blocks if a peer was online for some time. The assumption is that this will filter out rather short-lived peers.

This GH issue is for proposal 2).

The discussion around 1) happens in #9982

1. Delay Provider Record Publication

The idea here is that we change the default configuration to only reprovide blocks after a Kubo node has had a consecutive uptime of X minutes/hours/days. The assumption is that nodes which have been online for a long time straight, will likely stay online and are stable.

There are some nuances to consider here (copied from probe-lab/network-measurements#49 (comment)):

Today, block reproviding is a global flag in Kubo (IPFS Desktop, Brave): we do not distinguish between blocks fetched while browsing websites (temporarily stored in the cache), and blocks imported by user by adding their own data to local node (either pinned, in MFS, or just in cache). Both types of data are stored and reprovided by the same code paths, and we can't rely on pinning and MFS to identify user data, because ipfs block put and ipfs dag put do not pin by default.

That is to say, disabling reproviding only for third-party content is not trivial: to only stop reproviding third-party website data, we would have to introduce separate datastores with different reproviding settings for first-party and third-party blocks in Kubo.
Content explicitly imported by the user (ipfs add, ipfs dag put, ipfs block put, ipfs dag import), or pinned by user, would be added/moved to first-party datastore.

A different, a bit simpler approach would be to keep a single datastore, but instead introduce a new default "auto" Reprovider.Strategy that:

  1. always announces pinned content (+implicitly pinned MFS) → ensures user content is always reachable asap
  2. announces the remaining blocks in cache (incl. ones that come from browsed websites) ONLY if a node was online for some time (we would add optionalDuration Reprovider.UnpinnedDelay to allow users to adjust the implicit default)
  3. TBD how we solve ipfs dag put and ipfs block put or other user content that is not pinned, but expected to "work instantly:
    1. we could flip --pin in them to true → breaking change (may surprise users who expect these to not keep garbage around, may lead to services running out of disk space)
    2. we could say that the ability for users to set Reprovider.Strategy to all and/or adjust Reprovider.UnpinnedDelay are enough here, ipfs routing provide exists, we could add --all to allow apps/users to manually trigger provide before Reprovider.UnpinnedDelay hits. (feels safer than A, no DoS, worst case a delay in announce on a cold boot)

A personal remark: It would be great if the user content that is expected to "work instantly" could make use of the fast provide operation. I think these commands are not blocking right now, correct? Using optimistic provide could justify making them blocking. But again, the provide strategy is a global switch. It would be great if the application layer could have more control over the publication process based on its specific needs.

2. Decreased Provider Record TTL

The idea here is to keep everything as is and just transmit the desired provider record TTL. The TTL would be calculated based on the nodes uptime only become a high number if the node has been up for X minutes/hours/days.

At the first glance this is a breaking protocol change but protobuf allows to add new fields without breaking old implementations (see comment from @guillaumemichel probe-lab/network-measurements#49 (comment)). This means we could add the new field and nodes that understand it could adhere to the TTL the provider wants to set. Everyone else would just continue as before.

Some things to consider:

  • @aschmahmann remarked that we should be careful not opening a DoS vector
  • The TTL that the client wants to set should have an upper bound that the server enforces. This value should be set to the current TTL.
  • This strategy increases load on DHT servers because reprovides will happen more frequently. On the other hand, the number of provider records a server holds could decrease because they are garbage collected more frequently.

Measurements

TBD: How can we substantiate the proposal with numbers? some ideas

References

@dennis-tra dennis-tra added the kind/feature A new feature label Jun 21, 2023
@dennis-tra dennis-tra changed the title Unreachable Providers (client/provider): Re-Provide Delay/Decreased Record TTL Unreachable Providers (client): Re-Provide Delay/Decreased Record TTL Jun 21, 2023
@dennis-tra dennis-tra changed the title Unreachable Providers (client): Re-Provide Delay/Decreased Record TTL Unreachable Providers (client): Decreased Re-Provide Delay/Record TTL Jun 21, 2023
@lidel lidel added exp/expert Having worked on the specific codebase is important P2 Medium: Good to have, but can wait until someone steps up topic/provider Topic provider topic/config Topic config effort/days Estimated to take multiple days, but less than a week topic/design-ux UX strategy, research, not solely visual design labels Jul 24, 2023
@lidel
Copy link
Member

lidel commented Jul 24, 2023

Triage notes:

  • (1) Delay Provider Record Publication
    • Getting UX to the acceptable point will take a lot of work
  • (2) Decreased Provider Record TTL
  • Meta comment: the changes proposed here won't improve things at the release time, it will take 6+ months to see improvement
    • we did not do server-side changes so far, so a lot of unknown unknowns
    • if we improve things on the client side, preferably, it should be at the time of looking for providers, not publishing providers, that will improve things at release time – lower risk, faster feedback loop, easier to run simulaitons, or back away
      • Kubo maintainers would prefer tackling this here (could be tackled in parallel, but realistically if we have to choose, this would be our bet)
      • So proposals that make provider lookup smarter should take precedence over ones described here (feel free to fill new issue and cc this one for discoverability)

@lidel lidel added effort/weeks Estimated to take multiple weeks and removed effort/days Estimated to take multiple days, but less than a week labels Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/weeks Estimated to take multiple weeks exp/expert Having worked on the specific codebase is important kind/feature A new feature P2 Medium: Good to have, but can wait until someone steps up topic/config Topic config topic/design-ux UX strategy, research, not solely visual design topic/provider Topic provider
Projects
None yet
Development

No branches or pull requests

2 participants