Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HelpWanted] Download progress #71

Open
jawainc opened this issue Jun 22, 2023 · 17 comments
Open

[HelpWanted] Download progress #71

jawainc opened this issue Jun 22, 2023 · 17 comments
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@jawainc
Copy link

jawainc commented Jun 22, 2023

Hello,
Thanks for the great library.

I am facing problem for displaying the download progress

here's my code:

   const files = response.data.files

    totalBytes = getTotalBytes(files)

    const requests = files.map((file) => fetch(file.url)
      .then(async (response) => {
        const reader = response.body.getReader()
        const chunks = []
        while (true) {
          const { done, value } = await reader.read();
          if (done) {
            break;
          }
          chunks.push(value)
          receivedBytes += value.length;
          progress(receivedBytes)
        }
        return new Blob(chunks) //--> dont know if this is the correct way
      })
      .catch(error => {
        throw new Error(error.message + "\nURL: " + file.url)
      })
    )
    const responses = await Promise.all(requests)
    
    console.log("response: ", responses) //--> this works and output like:  (11) [Blob, Blob, Blob, Blob, Blob, Blob, Blob, Blob, Blob, Blob, Blob]

    const blob = await downloadZip(responses); //--> not working

    const link = document.createElement("a")
    link.href = URL.createObjectURL(blob)
    link.download = fileName + ".zip"
    link.click()
    URL.revokeObjectURL(link.href)
    link.remove()

I've tried solution mentioned here: #19
but it does not calculate when files are downloaded but rather I think when making zip file

Any help on this matter Please

@Touffy
Copy link
Owner

Touffy commented Jun 22, 2023

When using an intermediate Blob, the process will be divided in two steps :

  1. Fetching the input files and creating the Zip file into a Blob. This depends on CPU and network.
  2. "downloading" the result from that Blob. This is just moving memory to disk.

To be clear, which one do you want to measure ?

If you were not using a Blob, but streaming through a Service Worker, the two steps would be concurrent and move roughly at the same pace, so the question wouldn't really matter.

@jawainc
Copy link
Author

jawainc commented Jun 22, 2023

Thankyou for your quick response

1st option, fetching the input file and creating zip.

I got it working with following:

const files = response.data.files

    totalBytes = getTotalBytes(files)

    const requests = files.map((file) => fetch(file.url)
      .then(async (response) => {
        const reader = response.body.getReader()
        const chunks = []
        while (true) {
          const { done, value } = await reader.read();
          if (done) {
            break;
          }
          chunks.push(value)
          receivedBytes += value.length;
          progress()
        }
        return {name: file.name, input: new Blob(chunks)} //--> changed from just returning blob
      })
      .catch(error => {
        throw new Error(error.message + "\nURL: " + file.url)
      })
    )
    const responses = await Promise.all(requests)

    const blob = await downloadZip(responses ).blob();

    const link = document.createElement("a")
    link.href = URL.createObjectURL(blob)
    link.download = fileName + ".zip"
    link.click()
    URL.revokeObjectURL(link.href)
    link.remove()

@Touffy
Copy link
Owner

Touffy commented Jun 25, 2023

Yeah, I guess that'll do it. Although it is rather inefficient. You're starting all the downloads at once, and waiting until they're all fully buffered, before downloadZip can even begin to work.

I suggest using my DownloadStream instead of Promise.all and then an async generator to make those blobs as needed from the stream. Something like this :

const blob = await downloadZip(blobAndCountBytes(new DownloadStream(files))).blob()

async function *blobAndCountBytes(downloadStream) {
  for await (const response of downloadStream) {
    // do your thing with the Reader, increasing receivedBytes and making a blob
    // and get the filename (for example by looking up the response.url in a Map)
    yield {name: filename, input: new Blob(chunks)}
  }
}

Even better (and shorter to write), you could use a TransformStream on each Response.body instead of reading the whole Response into a Blob. The transform function for the TransformStream would just increment receivedBytes and enqueue the input chunk. You can use that stream as input for downloadZip instead of a blob, which allows downloadZip to start working as soon as some data is available instead of the whole file.

const blob = await downloadZip(countBytes(new DownloadStream(files))).blob()

async function * countBytes(downloadStream) {
  for await (const response of downloadStream) {
    const stream = response.body.pipeThrough(new TransformStream({
      transform(chunk, ctrl) {
        receivedBytes += chunk.length
        ctrl.enqueue(chunk)
      }
    }))
    // get the filename (for example by looking up the response.url in a Map)
    yield {name: filename, input: stream}
  }
}

@Touffy Touffy added the question Further information is requested label Jul 12, 2023
@Touffy Touffy closed this as completed Jul 12, 2023
@ahamelers
Copy link

If you were not using a Blob, but streaming through a Service Worker, the two steps would be concurrent and move roughly at the same pace, so the question wouldn't really matter.

Hi @Touffy, how would you track progress (or even just trigger stuff on completion) from the client page using a service worker with client-zip and dl-stream?

@Touffy
Copy link
Owner

Touffy commented Oct 25, 2023

The client page isn't using JavaScript to download, so you have to do the tracking in the Service Worker (using any of the methods already discussed) and then feed the tracking information to the client. You could send progress events with postMessage.

Or maybe create a dedicated MessageChannel, transfer the receiving port over an initial postMessage to the client (fired when you begin creating the archive), and then post all further updates on that channel. That way, the client's global onmessage doesn't need to handle the complexity of following multiple streams (should that happen…) because you have a separate channel for each.

@Touffy
Copy link
Owner

Touffy commented Oct 25, 2023

So many people seem interested in tracking the downloads that I am considering adding a hook in client-zip just for that — it would be easier than what you can do from the outside — but I'm afraid it would hurt performance for people who aren't tracking client-zip's progress. Some tests are needed…

@ahamelers
Copy link

ahamelers commented Oct 26, 2023

I really just need a way to note that long downloads are complete (to clear firefox keep alive requests among other things). I love the serviceworker implementation that begins the download of the zip while the download of individual files is ongoing, and can't seem to find a way to mark the progress/completion of the download that doesn't interrupt this

@Touffy
Copy link
Owner

Touffy commented Oct 26, 2023

I'm thinking of a new option for downloadZip where you give it a writable MessagePort, and it will post a messages to it every time a file is fully processed, when chunks from a large file are consumed, and a final message at the end. In your case, you can just listen to that final message. It should be easy to implement, but then, like I said, I need to measure how much it will cost when the option is not used (and how much when it is used, too).

@Touffy Touffy reopened this Oct 26, 2023
@Touffy Touffy added the enhancement New feature or request label Oct 26, 2023
@Touffy Touffy self-assigned this Oct 26, 2023
@Touffy
Copy link
Owner

Touffy commented Oct 29, 2023

I decided to post tracking messages after each file is completely processed, but not each chunk of larger files. That could be added later, probably with some throttling. Indeed, when using the new feature with my set of 12k very small files, creating the archive took nearly 10% longer. Keep in mind that's an extreme case.

Even better news : performance is unaffected by the new code as long as you don't actually use the optional MessagePort.

@ahamelers
Copy link

That's great! In my implementation in the meantime, I made worker scripts for predictLength and makeZip, and using .tee() with makeZip and recreating the downloadZip response, I managed to note when the download was complete while still downloading the zip while it was created with dl-stream. However, when it came time to act on the message in the client, I realized the form submission had refreshed the page and console, removing or uninitializing the message port I'd created there 🤦 . So when you do release the tracking messages, I'm not sure what I'll be able to do with them!

@Touffy
Copy link
Owner

Touffy commented Oct 31, 2023

Hi Audrey. Yeah, I had a similar problem while updating the Service Worker demo to use the new MessageChannel feature. I tried posting a message back to the client that started the fetch event (to transfer the other port). That client declares a resultingClientId instead of giving its current clientId, because it's expecting to navigate to a new page after the fetch. But when the Response headers arrive with Content-Disposition: attachment, it doesn't navigate after all, and it keeps its old clientId — except the Service Worker doesn't know it.

By the way, did you forget to send that Content-Disposition: attachment in your Response ? your client window should not refresh when downloading. Unless your Service Worker doesn't call respondWith fast enough. Do you have to await anything before you call respondWith ?

@ahamelers
Copy link

I did set the header, and the page does not refresh in actuality (my firefox keep-alive fetches keep happening, and content on the page is still actionable) but the messagePort's event listener seems to do nothing, if a dev tools inspect window is open the html and console clear out, and the javascript console doesn't log anything after the form submission

@Touffy
Copy link
Owner

Touffy commented Nov 1, 2023

Right. The symptoms look a little different in Safari's console but in the end, it's the same thing. My MessageChannel breaks as soon as the Response begins. The client keeps working in most other ways. For one thing, if I set a timeout before the download starts, it still triggers its callback as expected. But even if we used a delayed postMessage to get the other MessagePort from the ServiceWorker, there would still be a risk that the user starts another download that breaks the MessageChannel later.

I think the MessageChannel path is a dead end for now. At least, for communicating with the client window that started the download. It should work nicely if instead, you used an iframe to display the download progress and/or keep the Service Worker alive. Or an iframe to download. Whatever. Just not the same client for downloading and for tracking. I'll give that a try.

@Touffy
Copy link
Owner

Touffy commented Nov 1, 2023

Listening to the tracking channel in an iframe doesn't work. When the parent breaks, it breaks the iframe too. Trying the other way around now…

@ahamelers
Copy link

ahamelers commented Nov 1, 2023

I've actually finally gotten mine to receive the message, and to work on Chrome, Safari, and Firefox with multiple simultaneous downloads of different filesets. Hopefully more testing won't break it 🤞

Page javascript:

const dlbutton = document.getElementById('download_zip_button');
if ("serviceWorker" in navigator) {
  const messageChannel = new MessageChannel();
  navigator.serviceWorker.register('/service-worker.js');
  let keepAlive = null;
  const form = document.getElementById('zip_download');
  navigator.serviceWorker.ready.then(worker => {
    worker.active.postMessage({type: 'PORT_INITIALIZATION', url: form.action}, [messageChannel.port2]); 
  });
  form.addEventListener('submit', e => {
    dlbutton.disabled = true;
    // etc.
    keepAlive = setInterval(() => {
      navigator.serviceWorker.ready.then(worker => {
        worker.active.postMessage({type: 'keep-alive'})
      })
    }, 10000);
  })
  messageChannel.port1.start();
  messageChannel.port1.addEventListener("message", ({data}) => {
    if (data.msg === 'Stream complete') {
      if (keepAlive) clearInterval(keepAlive);
        dlbutton.removeAttribute('disabled');
        // etc.
      }
  });
} else {
  dlbutton.hidden = true
}

Service worker:

importScripts('./client-zip/lengthWorker.js', './client-zip/makeZipWorker.js', './dl-stream/worker.js');
// './client-zip/worker.js',

const messagePorts = {};

self.addEventListener('activate', (event) => {
  event.waitUntil(self.clients.claim());
});

self.addEventListener('message', (event) => {
  if (event.data && event.data.type === 'PORT_INITIALIZATION') {
    messagePorts[event.data.url] = event.ports[0];
  }
});

self.addEventListener('fetch', (event) => {
  // This will intercept all request with a URL containing /downloadZip/ ;
  const url = new URL(event.request.url);
  const [, name] = url.pathname.match(/\/downloadZip\/(.+)/i) || [,];
  if (url.origin === self.origin && name) {
    event.respondWith(event.request.formData()
      .then((data) => {
        const urls = data.getAll('url')
        if (urls.length === 0) throw new Error('No URLs to download');
        if (messagePorts[event.request.url]) {
          messagePorts[event.request.url].postMessage({type: 'DOWNLOAD_STATUS', msg: 'Download started'});
        }
        const metadata = data.getAll('size').map((s, i) => ({name: data.getAll('filename')[i], size: s}));
        const headers = {
          'Content-Type': 'application/zip',
          'Content-Disposition': `attachment;filename="${name}"`,
          'Content-Length': predictLength([{name, size: 0}].concat(metadata)),
        };
        const [checkStream, printStream] = makeZip(new DownloadStream(urls), {metadata}).tee();
        const reader = checkStream.getReader();
        reader.read().then(function processText({done}) {
          if (done && messagePorts[event.request.url]) {
            messagePorts[event.request.url].postMessage({type: 'DOWNLOAD_STATUS', msg: 'Stream complete'});
            return;
          }
          return reader.read().then(processText);
        });
        return new Response(printStream, {headers});
        // return downloadZip(new DownloadStream(data.getAll('url')), {metadata});
      })
      .catch((err) => new Response(err.message, {status: 500})));
  }
});

I added the filename to the content-disposition header, which also required adding it to the content length in firefox if I wanted to have a content-length header and not break the download (thus me adding it to the predictLength function as a zero length file, which seems to work). I can't remember exactly what issue that was solving, but it works now so I'm not messing with it!

@douglasg14b
Copy link

Just here to support this, when downloading multiple large files, a built in way to capture progress is a must!

@Touffy
Copy link
Owner

Touffy commented May 27, 2024

Yeah…

But I don't like the details of Audrey's solution. You have to post a message to the Service Worker before the download (adds complexity, maybe unavoidable), but more importantly, it matches the window clientId with the form's action URL, which :

  • may not be unique, especially if the user opened the same page in multiple tabs,
  • may not be the actual URL called for the download, if the form items have formaction attributes.

Sorry I haven't focused on this problem since last year. I am of course open to suggestions :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants