Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve passthrough method to use an asynchrone pipeline #132

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 34 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -600,54 +600,52 @@ const { isInfected, viruses } = await clamscan.scanStream(stream);

## .passthrough()

The `passthrough` method returns a PassthroughStream object which allows you pipe a ReadbleStream through it and on to another output. In the case of this module's passthrough implementation, it's actually forking the data to also go to ClamAV via TCP or Domain Sockets. Each data chunk is only passed on to the output if that chunk was successfully sent to and received by ClamAV. The PassthroughStream object returned from this method has a special event that is emitted when ClamAV finishes scanning the streamed data so that you can decide if there's anything you need to do with the final output destination (ex. delete a file or S3 object).
The `passthrough` method returns a PassthroughStream object which allows you pipe a ReadbleStream through it and on to another output. In the case of this module's passthrough implementation, it's actually forking the data to also go to ClamAV via TCP or Domain Sockets. Each data chunk is only passed on to the output if that chunk was successfully sent to and received by ClamAV. The PassthroughStream object returned from this method contains a 'result' property which will be complete at the end of the pipeline and will contain the elements linked to the scanned file. In the case of an infected file, you can decide if there's anything to be done after the pipeline has been completed (ex. delete a file into I/O disk or S3 object).

In typical, non-passthrough setups, a file is uploaded to the local filesytem and then subsequently scanned. With that setup, you have to wait for the upload to complete _and then wait again_ for the scan to complete. Using this module's `passthrough` method, you could theoretically speed up user uploads intended to be scanned by up to 2x because the files are simultaneously scanned and written to any WriteableStream output (examples: filesystem, S3, gzip, etc...).
With this method, the file is both transmitted via TCP socket to clamav and also piped in an output. However, clamAV waits to collect the file chunks before scanning the entire file.

As for these theoretical gains, your mileage my vary and I'd love to hear feedback on this to see where things can still be improved.

Please note that this method is different than all the others in that it returns a PassthroughStream object and does not support a Promise or Callback API. This makes sense once you see the example below (a practical working example can be found in the examples directory of this module):

### Example

```javascript
const NodeClam = require('clamscan');

// You'll need to specify your socket or TCP connection info
const clamscan = new NodeClam().init({
clamdscan: {
socket: '/var/run/clamd.scan/clamd.sock',
host: '127.0.0.1',
port: 3310,
}
});

// For example's sake, we're using the Axios module
const axios = require('Axios');
(async() => {
const clamscan = new NodeClam().init({
clamdscan: {
socket: '/var/run/clamd.scan/clamd.sock',
host: '127.0.0.1',
port: 3310,
}
});

// For example's sake, we're using the Axios module
const axios = require('Axios');

// Get a readable stream for a URL request
const input = axios.get(some_url);

// Create a writable stream to a local file
const output = fs.createWriteStream(some_local_file);

// Get instance of this module's PassthroughStream object
const av = await clamscan.passthrough();

// Send output of Axios stream to ClamAV.
// Send output of Axios to `some_local_file` if ClamAV receives data successfully
await pipeline(input, av, output)
const { isInfected, viruses, timeout } = av.result;

// Get a readable stream for a URL request
const input = axios.get(some_url);

// Create a writable stream to a local file
const output = fs.createWriteStream(some_local_file);

// Get instance of this module's PassthroughStream object
const av = clamscan.passthrough();

// Send output of Axios stream to ClamAV.
// Send output of Axios to `some_local_file` if ClamAV receives data successfully
input.pipe(av).pipe(output);

// What happens when scan is completed
av.on('scan-complete', result => {
const { isInfected, viruses } = result;
// Do stuff if you want
});
if (isInfected) {
throw new Error(`...${viruses.join("|")}`);
}

// What happens when data has been fully written to `output`
output.on('finish', () => {
// Do stuff if you want
});
if (timeout) {
throw new Error("...");
}
})()

// NOTE: no errors (or other events) are being handled in this example but standard errors will be emitted according to NodeJS's Stream specifications
```
Expand Down
108 changes: 38 additions & 70 deletions examples/passthrough.js
Original file line number Diff line number Diff line change
@@ -1,100 +1,68 @@
// eslint-disable-next-line import/no-extraneous-dependencies
const axios = require('axios');
const fs = require('fs');
const { promisify } = require('util');
const { pipeline } = require("stream/promises")
const { Writable, Readable } = require('stream');

const fsUnlink = promisify(fs.unlink);

// const fakeVirusUrl = 'https://secure.eicar.org/eicar_com.txt';
const normalFileUrl = 'https://raw.githubusercontent.com/kylefarris/clamscan/sockets/README.md';
// const largeFileUrl = 'http://speedtest-ny.turnkeyinternet.net/100mb.bin';
const passthruFile = `${__dirname}/output`;

const testUrl = normalFileUrl;
// const testUrl = fakeVirusUrl;
// const testUrl = largeFileUrl;
const testUrl = {
fakeVirusUrl: 'https://raw.githubusercontent.com/fire1ce/eicar-standard-antivirus-test-files/master/eicar-test.txt',
normalFileUrl: 'https://raw.githubusercontent.com/kylefarris/clamscan/master/examples/passthrough.js'
};

// Initialize the clamscan module
const NodeClam = require('../index.js'); // Offically: require('clamscan');

/**
* Removes whatever file was passed-through during the scan.
*/
async function removeFinalFile() {
try {
await fsUnlink(passthruFile);
console.log(`Output file: "${passthruFile}" was deleted.`);
process.exit(1);
} catch (err) {
console.error(err);
process.exit(1);
}
}

/**
* Actually run the example code.
*/
async function test() {
const clamscan = await new NodeClam().init({
debugMode: true,
clamdscan: {
host: 'localhost',
port: 3310,
bypassTest: true,
timeout: 30000
// socket: '/var/run/clamd.scan/clamd.sock',
},
});

const input = axios.get(testUrl);
const output = fs.createWriteStream(passthruFile);
const av = clamscan.passthrough();

input.pipe(av).pipe(output);

av.on('error', (error) => {
if ('data' in error && error.data.isInfected) {
console.error('Dang, your stream contained a virus(es):', error.data.viruses);
} else {
console.error(error);
const input = await axios.get(testUrl.fakeVirusUrl);
// output can be a fs.createWriteStream
const output = new Writable({
write(chunk, _, cb) {
cb(null, chunk);
}
removeFinalFile();
})
.on('timeout', () => {
console.error('It looks like the scanning has timedout.');
process.exit(1);
})
.on('finish', () => {
console.log('All data has been sent to virus scanner');
})
.on('end', () => {
console.log('All data has been scanned sent on to the destination!');
})
.on('scan-complete', (result) => {
console.log('Scan Complete: Result: ', result);
if (result.isInfected === true) {
console.log(
`You've downloaded a virus (${result.viruses.join(
', '
)})! Don't worry, it's only a test one and is not malicious...`
);
} else if (result.isInfected === null) {
console.log(`There was an issue scanning the file you downloaded...`);
} else {
console.log(`The file (${testUrl}) you downloaded was just fine... Carry on...`);
}
removeFinalFile();
process.exit(0);
});

output.on('finish', () => {
console.log('Data has been fully written to the output...');
output.destroy();
});

output.on('error', (error) => {
console.log('Final Output Fail: ', error);
process.exit(1);
});

try {
const av = await clamscan.passthrough();

await pipeline(Readable.from(input.data), av, output)
const { isInfected, viruses, timeout } = av.result;

if (isInfected === null) {
console.log(`There was an issue scanning the file you downloaded...`);
}

if (isInfected === true) {
console.log(
`You've downloaded a virus (${viruses.join(
', '
)})! Don't worry, it's only a test one and is not malicious...`
);
}

if (timeout === true) {
console.error('It looks like the scanning has timedout.');
}
} catch (error) {
// handle errors
// Can be piped error, or connexion error
}
}

test();
Loading