Skip to content
This repository has been archived by the owner on Aug 12, 2020. It is now read-only.

feat: Add reader to read files or part of files as streams #202

Merged
merged 1 commit into from
Mar 20, 2018

Conversation

achingbrain
Copy link
Collaborator

Adds a readable function to return a ReadableStream that supports reading arbitrary chunks of files with the signature readable(cid/path, ipldResolver, begin, end).

I made the begin, end arguments support the semantics of Array.slice(begin, end) as it's probably familiar to most JavaScript developers.

If this gets merged I'll do some follow up PRs for the specs repo & the js-ipfs examples.

One question I had - I've implemented this as a ReadableStream internally but I see a lot of pull streams in this module. Would it be better implemented internally as a pull stream and turned into a ReadableStream at a higher level similar to ipfs.files.catReadableStream()?

Related issues:

ipfs/js-ipfs#1231
ipfs/js-ipfs#348
#172

@achingbrain achingbrain force-pushed the add-readable branch 2 times, most recently from 1a1f526 to 942b21e Compare February 23, 2018 11:35
@achingbrain
Copy link
Collaborator Author

Actually forget that last paragraph, I refactored the PR to use pull streams.

I am getting a test error though, although it fails for me on master too, so unrelated to this PR:

  1) IPFS UnixFS Engine
       builder
         allows multihash hash algorithm to be specified:
     Error: multihash function 86 not yet supported
  2) IPFS UnixFS Engine
       builder
         allows multihash hash algorithm to be specified for big file:
     Error: multihash function 86 not yet supported

test/reader.js Outdated
})

it('reads bytes with a negative end', (done) => {
ipfs.files.add({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything except the assertion in the end seems duplicated across the test cases. Could you extract them into a function and call that instead please?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

test/reader.js Outdated
pull(
reader('?!?', ipfs._ipldResolver),
collect((error) => {
expect(error.message).to.contain('Non-base58 character')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error should also mention the value that was invalid, so the user can easily see what the wrong input is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message comes from base-x so it could be improved there but it appears it's depended on in a few places so might be a more disruptive job than it appears.

@daviddias
Copy link
Contributor

@pgte can you review this PR?

Copy link
Contributor

@pgte pgte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks super tight, great work on the tests.
Just left a question about the need of an extra dep.

let streamPosition = 0

return pull(
asyncValues((cb) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, didn't know about pull-async-values.
Couldn't this be done instead with something like a pull.value([path]) and then using a chain of pull.map and pull.asyncMap?
I'm asking because this way we wouldn't need to add an extra dependency..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find an easy way to asynchronously supply an array of values to pull-stream when you don't know the length of the values ahead of time using only the built-in functions.

The faq recommends using external modules for 1:many transforms and async steps so I'm not sure it's possible without adding an extra dep.

That said, the source for pull-async-values is pretty small - I can pull it in to this repo if required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achingbrain Thanks for explaining, it's fine as it is by me :)

pgte
pgte previously approved these changes Mar 2, 2018
@daviddias
Copy link
Contributor

daviddias commented Mar 6, 2018

@achingbrain is this PR ready for any kind of Unixfs layout? I mean by that: balanced, flat and trickledag? Does it correctly know how to traverse the graph and avoid fetching unnecessary blocks when starting at a offset?

Also, weird that tests are failing with a multihash error. Mind checking into that?

@achingbrain
Copy link
Collaborator Author

is this PR ready for any kind of Unixfs layout?

I've refactored the tests to use the different UnixFS layouts, they also use an ipld instance directly now instead of spinning up an IPFS node so are faster and more consistent with other tests in this repo.

Does it correctly know how to traverse the graph and avoid fetching unnecessary blocks when starting at a offset?

Yes it uses the list of links to only fetch nodes that contain data from of the slice requested.

Also, weird that tests are failing with a multihash error. Mind checking into that?

This problem seems to have fixed itself 🤷 - I think the problem was in one of the dependencies, the test in question used the list of supported hashing algorithms exported by the multihashes module but it contained an algorithm that another module wasn't expecting or didn't know how to interpret.

@mitra42
Copy link

mitra42 commented Mar 11, 2018

I'm wondering how this will plug into a html

Part of the challenge is the parameters you get from the successive calls from the browser's media player. For example - I'm not sure what ipldresolver would be in this case ?

The approach I'm taking to the semantics is to have two calls:
async function(cid/path) which returns a function
function(opts) => stream - where opts is in the format { start, end } and returns a stream.

This works well with the RenderMedia module to build or edit

Note I posted a issue ipfs/js-ipfs#1258 where some of the videos that play in the gateway arent fetchable with files.cat, try testing with /ipfs/zdj7Wc9BBA2kar84oo8S6VotYc9PySAnmc8ji6kzKAFjqMxHS if you want to make sure you aren't hitting that issue.

@achingbrain
Copy link
Collaborator Author

@diasdavid @pgte @victorbjelkholm Can this be merged?

@ghost ghost assigned achingbrain Mar 20, 2018
@ghost ghost added status/in-progress In progress and removed status/ready Ready to be worked labels Mar 20, 2018
@daviddias daviddias merged commit 833accf into ipfs-inactive:master Mar 20, 2018
@ghost ghost removed the status/in-progress In progress label Mar 20, 2018
@daviddias
Copy link
Contributor

Releasing it now. Thank you @achingbrain :D

@achingbrain
Copy link
Collaborator Author

N.b. superseded by #207

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants