Description
Using this library to read large zip files in browsers (2-80MB zipped, 15-300MB+ unzipped!) I unsurprisingly came across memory problems in various browsers (IE being worst of course, chrome best) and am thus trying to find ways to alleviate this
Looking at the original code, there seems to be a lot of string ops which seemed to cause some browsers to bloat their memory use, for instance I found I could replace this method in jszip-load.js:
findDataUntilDataDescriptorOld : function(reader) {
var data = "",
buffer = reader.readString(4),
aByte;
while(buffer !== JSZip.signature.DATA_DESCRIPTOR) {
aByte = reader.readString(1);
data += buffer.slice(0, 1);
buffer = (buffer + aByte).slice(-4);
}
return data;
},
with the following method:
findDataUntilDataDescriptor : function(reader) {
var data = "";
var startIndex = reader.index;
var endIndex = reader.stream.indexOf (JSZip.signature.DATA_DESCRIPTOR, startIndex);
data = reader.readString (endIndex - startIndex);
reader.readString (4); // this should chew the JSZip.signature.DATA_DESCRIPTOR bytes;
return data;
},
I read slice on some browsers is a copying method, and one big slice operation (readString) seems to be preferable than 1000's of mini-slicing operations and concatenations.
PS I now think I've found a way to simply return the start index of the file data from this method and pass it into the zip inflate methods rather than a substring of the zip file to achieve the same result, but it involves some fairly unsubtle changes to a few methods.
Before I drop that in here, let me know if you think the above is at least an improvement or there's some reason I've gone wrong in my thinking somewhere (maybe someone's already done this?). I'm also looking at methods for only decompressing some of the files in a zip, and looking at changing/extending the inflate methods to conditionally ignore decompressed data (I'm working with large zipped csv datasets, so I'm trying to find ways to only read say the first and third columns, if I try and do this by reading the whole file into in memory the browser falls over, but I think I may have a chance if I discard the data as I'm decompressing it by counting field/line delimiters). Again this is all for my own problem so its use and generality for others is decreasing a bit.
PPS. This is a good library, it's the only javascript zip library I tested that did actually read the files in I needed!
PPPS. Sorry this post is very long.