Skip to content
This repository has been archived by the owner on Aug 11, 2021. It is now read-only.

feat: new IPLD Format API #127

Merged
merged 1 commit into from
May 8, 2019
Merged

feat: new IPLD Format API #127

merged 1 commit into from
May 8, 2019

Conversation

vmx
Copy link
Member

@vmx vmx commented Apr 21, 2019

BREAKING CHANGE: The API is now async/await based

There are numerous changes, the most significant one is that the API
is no longer callback based, but it using async/await.

The properties of DAGNode and DAGLink are now in sync with the paths
that are used for resolving. This means that e.g. name is now called
Name and size is Tsize.

All return values from resolve() now conform to the IPLD Data Model,
this means that e.g. links are no longer represented as
{'/': "baseecodedcid"}, but as CID instances instead.

For the full new API please see the IPLD Formats spec.

@vmx vmx requested review from rvagg and achingbrain April 21, 2019 23:44
@ghost ghost assigned vmx Apr 21, 2019
@ghost ghost added the status/in-progress In progress label Apr 21, 2019
@codecov
Copy link

codecov bot commented Apr 21, 2019

Codecov Report

Merging #127 into master will increase coverage by 2.97%.
The diff coverage is 94.69%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #127      +/-   ##
==========================================
+ Coverage   91.14%   94.11%   +2.97%     
==========================================
  Files          14       15       +1     
  Lines         271      238      -33     
==========================================
- Hits          247      224      -23     
+ Misses         24       14      -10
Impacted Files Coverage Δ
src/dag-link/util.js 100% <ø> (ø) ⬆️
src/dag-node/clone.js 62.5% <0%> (ø) ⬆️
src/resolver.js 100% <100%> (+5.79%) ⬆️
src/dag-link/index.js 85.18% <100%> (-2.82%) ⬇️
src/util.js 100% <100%> (+2.04%) ⬆️
src/dag-node/addLink.js 100% <100%> (+9.09%) ⬆️
src/dag-node/util.js 94.44% <100%> (+3.96%) ⬆️
src/dag-node/create.js 100% <100%> (+12.5%) ⬆️
src/index.js 100% <100%> (ø) ⬆️
src/dag-node/addNamedLink.js 100% <100%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 63b7986...e732925. Read the comment docs.

src/dag-node/create.js Outdated Show resolved Hide resolved
}

return callback(null, new DAGNode(data, links, buffer.length))
const serialized = await serialize({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serialize() doesn't return a Promise at all so await can be removed from this

src/resolver.js Outdated
remainderPath: split.slice(1).join('/')
})
}
exports.resolve = async (binaryBlob, path) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async can be removed from this, it'll just make an unnecessary Promise

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for consistency with other formats. I'd like to keep it for now until it turns out to be an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the consistency argument but really think the format API needs to be "MaybePromise" rather than "Promise", or just make resolve() completely synchronous. None of the formats do anything async within resolve(), they're all the same pattern as this one (other than raw which is even simpler):

So the async here is theoretical at best. I like that it might be accommodating a potential of encryption or something else novel, but so far it doesn't, and that could be accomplished by stating in the format specification that it "may be a Promise" so users may want to opt for await if they want to be maximally compatible. I suspect most uses of the current code are sticking to specific implementations though and that flexibility can be left to the next generation stack.

As I understand, Matteo's performance analysis identified as the faux-async with setImmediate() everywhere was the biggest cost for js-ipfs. Replacing them with async doesn't buy much at all, it's just different sugar for deferring something that isn't actually asynchronous.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make resolve() sync for the same reason that I made serialize()/deserialize() sync. Thanks for this outside perspective. I'm currently too deep into this without seeing the bigger picture.

src/resolver.js Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
})

it('dagNode.toJSON with data no links', (done) => {
it('dagNode.toJSON with data no links', async () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async can be removed

test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/dag-node-test.js Outdated Show resolved Hide resolved
test/resolver.spec.js Outdated Show resolved Hide resolved
test/resolver.spec.js Outdated Show resolved Hide resolved
test/resolver.spec.js Outdated Show resolved Hide resolved
@rvagg
Copy link
Member

rvagg commented Apr 24, 2019

Sorry, there's a bunch of noise in there.

First, the Title cased properties aren't great. Title case in idiomatic JavaScript should be reserved for class / prototype / global object names, elsewhere it's just out of place.

Next, the async thing is really interesting. This is the only truly set of async functions in the whole lot:

addLink()
-> asDagLink()
-> toDagLink()
-> cid()
-> multihashing()

And it's only in addLink() as sugar, arguably.

One nice thing about await is that it's agnostic about whether it's really waiting on a Promise or not. So you can export functions that aren't async and use them as if they are. So the ipld-dag formats interface may assume it, but the formats themselves don't necessarily need to implement it that way. As long as the user uses await they can be async or bare. When using a maybe-async interface, you can squeeze out extra performance by inspecting a return value to see if it's a Promise or not and either resolve/await it or use the bare value if it's not (this is the discussion we've been having about the new stack interfaces too). IMO that gets a bit messy but because await will insert an extra Promise (or two, or three, depending on the VM), it can be worthwhile in hot code.

So, in the case of ipld-dag-pb, the only thing we're doing that is async is some multihashing-async work. So maybe we could consider adjusting the interface slightly stop that getting in the way of addLink() by refusing to add a DAGNode as a link and forcing the upstream user to do the conversion themselves prior to calling addLink(). Then any user code that doesn't need to do such a conversion will get a fully-sync interface. This would be as simple as something like exposing async toDAGLink() on DAGNode and then callers just need to replace await DAGNode.addLink(node2, node1) with DAGNode.addLink(node2, await node1.toDAGLink()) and then all other strictly necessary await calls go away leaving them optional.

It'd be pretty neat to ditch all of these pointless setInterval() calls unnecessary callback arguments with an interface that has a single method that returns a Promise. That'd have a pretty massive perf impact if ipld-dag-pb is being used so heavily.

@rvagg
Copy link
Member

rvagg commented Apr 24, 2019

OK, I see where the Titlecase is coming from, it's in the PB data model ipld/specs#114 (and util.js). I still don't think that should leak out to the JS interface though, it should be the job of the interface to translate a data layer into a language-idiomatic presentation of it.

@vmx vmx mentioned this pull request Apr 24, 2019
@vmx
Copy link
Member Author

vmx commented Apr 24, 2019

I've opened an issue for further addLink() discussions: #128. I won't like to postpone the merge of this PR even more.

@vmx
Copy link
Member Author

vmx commented Apr 24, 2019

I also don't like those Camelcase property names. But it makes sense in regards how current IPLD Formats are envisioned. If they serialize/deserialize synchronously then resolve() and tree() can just be generic functions. Also I consider dag-pb legacy and it should be only consumed by IPFS. I think it's OK to have a less JSy interface there.

@vmx
Copy link
Member Author

vmx commented Apr 24, 2019

@rvagg For future reference, feel free to mention things like "await not needed here" with a comment that it is throughout the whole file, you don't need to repeat it.

@rvagg
Copy link
Member

rvagg commented Apr 26, 2019

My remaining hangup is with resolve() being async, expanded on in https://github.com/ipld/js-ipld-dag-pb/pull/127/files#r278797728. I first question what the rationale for making it async in the format spec is if we don't have a current format that needs it. Then, if we have a theoretical need for it (encryption, maybe), I advocate for an optional async and documenting that users should use await or do Promise inspection if they are not depending on a specific format but need to be generic. Promises have a big perf cost in cases where they're not needed and this lowest level of ipfs is going to have a big impact. If the current unixfs implementation is tied to dag-pb then it could be refactored to entirely do away with the faux-async in every case other than CID generation and it'd have huge performance gains. By forcing an async there's no option for downstream consumers to avoid the cost.

@rvagg
Copy link
Member

rvagg commented Apr 26, 2019

clone() in src/dag-node/clone.js still takes a callback and calls create() with a callback.

And README needs updating, is that a separate aegir step?

BREAKING CHANGE: The API is now async/await based

There are numerous changes, the most significant one is that the API
is no longer callback based, but it using async/await.

The properties of DAGNode and DAGLink are now in sync with the paths
that are used for resolving. This means that e.g. `name` is now called
`Name` and `size` is `Tsize`.

All return values from `resolve()` now conform to the [IPLD Data Model],
this means that e.g. links are no longer represented as
`{'/': "baseecodedcid"}`, but as [CID] instances instead.

For the full new API please see the [IPLD Formats spec].

[IPLD Data Model]: https://github.com/ipld/specs/blob/master/IPLD-Data-Model-v1.md
[CID]: https://github.com/multiformats/js-cid/
[IPLD Formats spec]: https://github.com/ipld/interface-ipld-format
@vmx vmx merged commit 1de1bcc into master May 8, 2019
@vmx vmx deleted the ipld-format-cleanup branch May 8, 2019 12:34
@ghost ghost removed the status/in-progress In progress label May 8, 2019
This was referenced May 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants