Skip to content

feat: partial DAG provides with Reprovider.Strategy=mfs|pinned+mfs #10754

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Apr 9, 2025

Conversation

hsanjuan
Copy link
Contributor

@hsanjuan hsanjuan commented Mar 13, 2025

This augments the current provider KeyChanFuncs with the CIDs coming from the MFS DAG. MFS CIDs are listed using the offline fetcher, which has been updated with SkipNotFound so that no errors happen during traversal when nodes are missing.

Strategies have been updated:

  • For "roots" and "all" strategy we announce the MFS root only, after the pins.
  • For the "pinned+mfs" strategy, we announce the MFS Cids after the pin cids.
  • A new "mfs" strategy is added, only for MFS cids.

The code around strategy selection has been slightly change for the better (I hope).

also:

Fixes #10386. Overseeds #10704.

This augments the current provider KeyChanFuncs with the CIDs coming from the MFS DAG. MFS CIDs
are listed using the offline fetcher, which has been updated with SkipNotFound so that no
errors happen during traversal when nodes are missing.

Strategies have been updated:

- For "roots" and "all" strategy we announce the MFS root only, after the
pins.
- For the "pinned" strategy, we announce the MFS Cids after the pin cids.
- A new "mfs" strategy is added, only for MFS cids.

The code around strategy selection has been slightly change for the better (I
hope).
case "pinned":
return provider.NewPrioritizedProvider(
provider.NewBufferedProvider(provider.NewPinnedProvider(false, in.Pinner, in.IPLDFetcher)),
mfsProvider(in.MFSRoot, in.OfflineUnixFSFetcher),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside here is that there is no strategy that only provide "recursive-pinned" content, and no mfs.

If we don't provide mfs here, there will be no policy that provides only "recursive-pinned+mfs" content... but it could be added as pinned+mfs or so...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MFS is implicitly pinned (protected from GC) so I think it is fine to keep it under pinned.
But things get hairy with roots (we dont have ability to only announce dir and file roots as we walk MFS, right?).

Maybe keeping pinned and roots as-is, and adding separate pinned+mfs for now is the right way of handling this? (we dont change behavior for existing users, and new users can choose pinned+mfs).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a problem to only announce the MFS-root? (consider it a big file)... Or you mean this could cause misunderstandings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the meantime I restored "pinned" and "roots" and added "pinned+mfs"

Copy link
Member

@lidel lidel Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking:

  • if you pin something then that pin is usually recursive
    • pinned announces recursive pins by walking the entire DAG behind each recursively pinned CID
    • roots announces only root cids of pins (no DAG walk at all)
  • pinned+mfs should do the same as pinned – walk the MFS DAG and announce MFS recursively (only local blocks).
  • roots
    • easy: could also announce only MFS root (no need to create roots+mfs just for one CID, fine to announce MFS root with other pin roots)
    • harder: make MFS walk smart enough to only yield root CIDs of files and directories (imo this comes with extra overhead, and no longer can be roots and requires separate profile roots+mfs)

I think its fine to do "easy" roots for now – announce the MFS root for now (without walking dag and discovering sub-entity roots).
We can add roots+mfs if users ask for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also announce only MFS root (no need to create roots+mfs just for one CID, fine to announce MFS root with other pin roots)

This is how it was.

@hsanjuan hsanjuan marked this pull request as ready for review March 17, 2025 10:54
@hsanjuan hsanjuan requested a review from a team as a code owner March 17, 2025 10:54
@hsanjuan hsanjuan requested a review from lidel March 17, 2025 10:54
This was referenced Mar 18, 2025
@lidel
Copy link
Member

lidel commented Mar 29, 2025

Looking good. Updated docs + ipfs/boxo#847 and started local test of pinned+mfs over the weekend. Likely merge next week.

@lidel
Copy link
Member

lidel commented Mar 29, 2025

Cosmetic: providing MFS which does not have all blocks locally, prints this to logs:

2025-03-29T02:03:06.113Z	ERROR	provider.batched	provider/dagprovider.go:41	dagprovider dag traversal error: skip

the specific "skip" event should not be logged as error (ok to log it as DEBUG)

@hsanjuan
Copy link
Contributor Author

Cosmetic: providing MFS which does not have all blocks locally, prints this to logs:

2025-03-29T02:03:06.113Z	ERROR	provider.batched	provider/dagprovider.go:41	dagprovider dag traversal error: skip

the specific "skip" event should not be logged as error (ok to log it as DEBUG)

Just to check, you removed the root MFS cid? It is the only way I can reproduce this.

@hsanjuan
Copy link
Contributor Author

And if the root block is not there, maybe it should be logged as error? It means we have initiated a traversal from a block that is not there which is probably not intended

@lidel
Copy link
Member

lidel commented Mar 31, 2025

@hsanjuan no, I have MFS root.

Perhaps this "skip" is due to one of partial dags I have (e.g. ipfs files cp /ipfs/bafybeif54ksdqydq3rliwjqps3wup6nulfbbrzgcvrrd5cgqqvps46eyma /partial-1EiB or ipfs files cp /ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze /wikipedia-350GiB)?

Lmk if this is not enough, I could block some time to debug try to reproduce with smaller repo if that helps.

@hsanjuan
Copy link
Contributor Author

hsanjuan commented Apr 1, 2025

@lidel the only way I get the dagprovider to spit out the error is when the root is missing. Walking partial DAGs should not cause it. Otherwise it would indicate that we are aborting partial-dag walks, which we shouldn't.

I have updated boxo to log the root cid when this error is produced. Can you test again? We can then decide at least if this root cid belongs to mfs or to a specific DAG or to something else.

@lidel
Copy link
Member

lidel commented Apr 2, 2025

@hsanjuan as far i can tell, the error is not due to MFS root root missing, but one of child branches missing from local store.

I was able to reproduce the problem with ipfs config Reprovider.Strategy mfs, ipfs config --json Routing.AcceleratedDHTClient true

To see it, you need to apply change to boxo to show the actual child CID that was skipped, and not the root of entire MFS (ipfs/boxo@e08fc17, added to ipfs/boxo#905)

I created MFS to be huge with lazy-loaded items:

$ ipfs files cp /ipfs/bafybeie6tqsijf7j6gpow5q3qdjmtqgscrmd2ijl4olpxanbbptmvh7sui /_Tsize_test
$ ipfs files cp /ipfs/bafybeie6tqsijf7j6gpow5q3qdjmtqgscrmd2ijl4olpxanbbptmvh7sui /_Tsize_test2
$ ipfs files cp /ipfs/bafybeie6tqsijf7j6gpow5q3qdjmtqgscrmd2ijl4olpxanbbptmvh7sui /_Tsize_test3
$ ipfs files cp /ipfs/bafybeie6tqsijf7j6gpow5q3qdjmtqgscrmd2ijl4olpxanbbptmvh7sui /_Tsize_test4

Then restarted the node, waited ~5min for accelerated client to finish mapping, and manually triggered ipfs bitswap reprovide.

ipfs daemon printed ERROR:

Daemon is ready
2025-04-02T19:51:08.768+0200	ERROR	provider.batched	provider/dagprovider.go:41	dagprovider dag traversal error from root QmRfuVHxBJWZTwpDpsFYrG545vgv2qCt5Biopbd2hmeKiF: error traversing node at "_Tsize_test/bafybeif54ksdqydq3rliwjqps3wup6nulfbbrzgcvrrd5cgqqvps46eyma/bafybeiewrbcx23v7sz7fgcmavanfxq4ki2rsuoguvrpiyc4sdyxjdhttwm": could not load link "bafybeiewrbcx23v7sz7fgcmavanfxq4ki2rsuoguvrpiyc4sdyxjdhttwm": unable to find bafybeiewrbcx23v7sz7fgcmavanfxq4ki2rsuoguvrpiyc4sdyxjdhttwm in local blockstore: skip
2025-04-02T20:01:08.770+0200	ERROR	provider.batched	provider/dagprovider.go:41	dagprovider dag traversal error from root QmRfuVHxBJWZTwpDpsFYrG545vgv2qCt5Biopbd2hmeKiF: error traversing node at "_Tsize_test/bafybeif54ksdqydq3rliwjqps3wup6nulfbbrzgcvrrd5cgqqvps46eyma/bafybeiewrbcx23v7sz7fgcmavanfxq4ki2rsuoguvrpiyc4sdyxjdhttwm": could not load link "bafybeiewrbcx23v7sz7fgcmavanfxq4ki2rsuoguvrpiyc4sdyxjdhttwm": unable to find bafybeiewrbcx23v7sz7fgcmavanfxq4ki2rsuoguvrpiyc4sdyxjdhttwm in local blockstore: skip

@lidel
Copy link
Member

lidel commented Apr 9, 2025

After staring at this for far too long, the error seems to be somehow bogus – perhaps it is the last internal skip being returned, or racy one due intenal lock?

With some extra debug log, confirmed the provide of all local blocks occurs every time, including the mfs root:

$ ipfs files cp /ipfs/bafybeie6tqsijf7j6gpow5q3qdjmtqgscrmd2ijl4olpxanbbptmvh7sui /_Tsize_test
$ ipfs files cp /ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/wiki /en.wikipedia-on-ipfs.org-wiki
$ ipfs files stat / | head -1
QmTeDEtragGSErNjsN7czYqX8KWVSoG3WHKEqDuDiYvQ9U
$ ipfs files ls / -l
_Tsize_test/	bafybeie6tqsijf7j6gpow5q3qdjmtqgscrmd2ijl4olpxanbbptmvh7sui	0
en.wikipedia-on-ipfs.org-wiki/	bafybeihn2f7lhumh4grizksi2fl233cyszqadkn424ptjajfenykpsaiw4	0
2025-04-09T03:26:25.269+0200	DEBUG	provider.batched	provider/dagprovider.go:16	DAG provider loaded	{"root": "QmTeDEtragGSErNjsN7czYqX8KWVSoG3WHKEqDuDiYvQ9U"}
2025-04-09T03:26:25.269+0200	DEBUG	provider.batched	provider/dagprovider.go:25	calling BlockAll	{"root": "QmTeDEtragGSErNjsN7czYqX8KWVSoG3WHKEqDuDiYvQ9U"}
skip error triggered bafybeif54ksdqydq3rliwjqps3wup6nulfbbrzgcvrrd5cgqqvps46eyma block was not found locally (offline): ipld: could not find bafybeif54ksdqydq3rliwjqps3wup6nulfbbrzgcvrrd5cgqqvps46eyma
skip error triggered bafybeif54ksdqydq3rliwjqps3wup6nulfbbrzgcvrrd5cgqqvps46eyma block was not found locally (offline): ipld: could not find bafybeif54ksdqydq3rliwjqps3wup6nulfbbrzgcvrrd5cgqqvps46eyma
skip error triggered bafybeif54ksdqydq3rliwjqps3wup6nulfbbrzgcvrrd5cgqqvps46eyma block was not found locally (offline): ipld: could not find bafybeif54ksdqydq3rliwjqps3wup6nulfbbrzgcvrrd5cgqqvps46eyma
skip error triggered bafybeif54ksdqydq3rliwjqps3wup6nulfbbrzgcvrrd5cgqqvps46eyma block was not found locally (offline): ipld: could not find bafybeif54ksdqydq3rliwjqps3wup6nulfbbrzgcvrrd5cgqqvps46eyma
skip error triggered bafybeif54ksdqydq3rliwjqps3wup6nulfbbrzgcvrrd5cgqqvps46eyma block was not found locally (offline): ipld: could not find bafybeif54ksdqydq3rliwjqps3wup6nulfbbrzgcvrrd5cgqqvps46eyma
skip error triggered bafybeichnleonsnrhdmykteijc2klanhwcrzoek4bzgdkx6amq3gwtj6uq block was not found locally (offline): ipld: could not find bafybeichnleonsnrhdmykteijc2klanhwcrzoek4bzgdkx6amq3gwtj6uq
2025-04-09T03:26:25.270+0200	DEBUG	provider.batched	provider/dagprovider.go:45	dagprovider dag traversal error from root QmTeDEtragGSErNjsN7czYqX8KWVSoG3WHKEqDuDiYvQ9U: skip
2025-04-09T03:26:25.771+0200	DEBUG	provider.batched	provider/reprovider.go:328	provide key QmTeDEtragGSErNjsN7czYqX8KWVSoG3WHKEqDuDiYvQ9U
2025-04-09T03:26:25.771+0200	DEBUG	provider.batched	provider/reprovider.go:328	provide key bafybeie6tqsijf7j6gpow5q3qdjmtqgscrmd2ijl4olpxanbbptmvh7sui
2025-04-09T03:26:25.771+0200	DEBUG	provider.batched	provider/reprovider.go:328	provide key bafybeihn2f7lhumh4grizksi2fl233cyszqadkn424ptjajfenykpsaiw4
2025-04-09T03:26:25.771+0200	DEBUG	provider.batched	provider/reprovider.go:350	starting provide of 3 keys
2025-04-09T03:27:02.532+0200	DEBUG	provider.batched	provider/reprovider.go:366	finished providing of 3 keys. It took 36.760657837s with an average of 12.253552612s per provide

This makes me believe we should not log ERROR if err := fetcherhelpers.BlockAll is traversal.SkipMe and instead log it to debug. Will double check tomorrow with fresh eyes.

@lidel lidel changed the title provider: provide MFS DAG using offline fetcher. feat: support partial DAG provide with Reprovider.Strategy=mfs|pinned+mfs Apr 9, 2025
@lidel lidel changed the title feat: support partial DAG provide with Reprovider.Strategy=mfs|pinned+mfs feat: partial DAG provides with Reprovider.Strategy=mfs|pinned+mfs Apr 9, 2025
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @hsanjuan, tested as much I could and provides what it should provide.

We will be testing pinned+mfs in collab cluster once 0.35.0-rc1 ships (we use mfs for extra backups of latest versions of websites)

@lidel lidel merged commit 996bcf3 into master Apr 9, 2025
16 checks passed
@lidel lidel deleted the provide-mfs-best-effort branch April 9, 2025 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants