-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: improve passthrough method to use an asynchrone pipeline #132
base: master
Are you sure you want to change the base?
Conversation
@Dafyh Thanks for the PR! It looks like you've put a lot of thought into the code and have done all the work necessary (updated docs, wrote tests, etc...). I did a bit of research since the link you provided doesn't necessarily state that "the data event from the socket is emitted only once per connection to deliver the scan results" anywhere that I could see. It does allude to the fact that With all that being said... it does seem that your PR does away with the "forking" aspect of this code which simultaneously piped data to the desired output and ClamAV. In other words, what we had before was a parallel pipeline but now we have a serial pipeline. This won't matter in most cases (small files) but for large files, the data not being sent along to the output until after the file has been fully scanned could significantly increase the time to completion which could decrease the end users' experience. For example, in the case of, say, a 3 GB file being uploaded to your server and then off to S3, it will have to be fully scanned before being uploaded to S3 whereas in the current implementation, the file is uploaded to S3 at the same time and one would merely need to remove the file from S3 if a virus was detected. A 3 GB file take quite a while to scan and there's only a very small chance of a virus being detected. It would be unfortunate to have to wait until that completed before then having to wait another decent chunk of time to upload the 3 GB to S3 (when it could have already been done). Does the current implementation prevent the usage of the |
Hello, thank you for your response 😊. In reality, the functionality is roughly the same as before because the file is sent both to ClamAV and to the output (S3 or disk I/O). By the time it reaches the _flush, the file has already been written by Node (for example, you can use the timers and set a setTimeout in the _flush method on the first line, and you'll see that the file, despite the delay, has already been written to the output). const timers = require("timers/promises")
// ...
// ...
async _flush(cb) {
await timers.setTimeout(10000);
// ...
} I implemented this because exceptions in event callbacks cannot be caught in a try/catch block, which can cause issues, especially for server responses. However, there's nothing stopping you from using events to avoid waiting for the pipeline because const av = await clamscan.passthrough();
output.on("finish", () => {
console.log(av.result);
})
input.pipe(av).pipe(output) What do you think ? |
For now, I am using clamscan like this in my code to handle exceptions in the main script : const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 60_000);
const av = this.#clamav.passthrough();
const result: any = await new Promise((res, rej) => {
av.on("scan-complete", (result) => {
clearTimeout(timeoutId);
res(result);
});
controller.signal.addEventListener("abort", () => {
data.destroy();
rej(new Error("Scan aborted"));
});
data.pipe(av).pipe(createWriteStream(tempPath));
});
const { isInfected, viruses, timeout } = result; But I wanted to make an improvement to make it more readable : const av = await this.#clamav.passthrough();
const signal = AbortSignal.timeout(60_000);
await pipeline(data, av, createWriteStream(tempPath), { signal });
const { isInfected, viruses, timeout } = av.result;
// OR
const av = await this.#clamav.passthrough();
const output = createWriteStream(tempPath);
output.on("finish", () => {
const { isInfected, viruses, timeout } = av.result;
// ...
})
data.pipe(av).pipe(output)
// Do something while you wait for the results |
Improvement of the passthrough method.
New implementation: