Skip to content

Commit

Permalink
Implement compression/decompression of filenames ending .gz
Browse files Browse the repository at this point in the history
  • Loading branch information
ChrisJefferson authored and fingolfin committed May 12, 2020
1 parent d805c7b commit c69cda3
Show file tree
Hide file tree
Showing 5 changed files with 180 additions and 6 deletions.
10 changes: 8 additions & 2 deletions src/sysfiles.c
Original file line number Diff line number Diff line change
Expand Up @@ -782,6 +782,9 @@ Int SyFopen (
Char cmd [1024];
int flags = 0;

Char * terminator = strrchr(name, '.');
BOOL endsgz = terminator && (strcmp(terminator, ".gz") == 0);

/* handle standard files */
if ( strcmp( name, "*stdin*" ) == 0 ) {
if ( strcmp( mode, "r" ) != 0 )
Expand Down Expand Up @@ -848,8 +851,11 @@ Int SyFopen (
#endif

/* try to open the file */
syBuf[fid].fp = open(name,flags, 0644);
if ( 0 <= syBuf[fid].fp ) {
if (endsgz && (syBuf[fid].gzfp = gzopen(name, mode))) {
syBuf[fid].type = gzip_socket;
syBuf[fid].fp = -1;
syBuf[fid].bufno = -1;
} else if (0 <= (syBuf[fid].fp = open(name, flags, 0644))) {
syBuf[fid].type = raw_socket;
syBuf[fid].echo = syBuf[fid].fp;
syBuf[fid].bufno = -1;
Expand Down
1 change: 1 addition & 0 deletions tst/example-dir/compress/not-compressed.txt.gz
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
not compressed
2 changes: 2 additions & 0 deletions tst/example-dir/readme.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,5 @@ Explanation

dir-test : A directory containing some example files and sub-directories
for testing directory enumeration.

compress : not-compressed.txt - A text file which is not compressed but ends in gz
169 changes: 169 additions & 0 deletions tst/testinstall/compressed.tst
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
#@local dir,fname,isGzippedFile,stream,str
gap> START_TEST("compressed.tst");
gap> dir := DirectoryTemporary();;
gap> fname := Filename(dir, "test.g.gz");;

# Let us check when we have written a compressed file by checking the gzip header
gap> isGzippedFile := function(dir, name)
> local out, str,prog;
> str := "";
> out := OutputTextString(str, true);
> Process(dir, Filename(DirectoriesSystemPrograms(),"cat"), InputTextNone(), out, [name]);
> return str{[1..2]} = "\037\213";
> end;;
gap> str := "hello\ngoodbye\n";;

# Write a compressed file
gap> FileString( fname, str ) = Length(str);
true

# Check file really is compressed
gap> isGzippedFile(dir, "test.g.gz");
true

# Check reading compressed file
gap> StringFile( fname ) = str;
true

# Check gz is added transparently
gap> StringFile( Filename(dir, "test.g") ) = str;
true

# Test reading/seeking in a gzip compressed file
gap> stream := InputTextFile(fname);;
gap> ReadLine(stream);
"hello\n"
gap> ReadLine(stream);
"goodbye\n"
gap> ReadLine(stream);
fail
gap> SeekPositionStream(stream, -1);
fail
gap> SeekPositionStream(stream, 0);
true
gap> ReadLine(stream);
"hello\n"
gap> ReadLine(stream);
"goodbye\n"
gap> ReadLine(stream);
fail
gap> SeekPositionStream(stream, 2);
true
gap> PositionStream(stream);
2
gap> ReadLine(stream);
"llo\n"
gap> ReadLine(stream);
"goodbye\n"
gap> SeekPositionStream(stream, 0);
true
gap> ReadAll(stream) = str;
true
gap> SeekPositionStream(stream, 0);
true
gap> PositionStream(stream);
0
gap> ReadAll(stream) = str;
true
gap> CloseStream(stream);

# Test multiple writes
gap> stream := OutputTextFile( fname, false );;
gap> PrintTo( stream, "1");
gap> AppendTo( stream, "2");
gap> PrintTo( stream, "3");
gap> CloseStream(stream);
gap> stream;
closed-stream
gap> isGzippedFile(dir, "test.g.gz");
true

# verify it
gap> stream := InputTextFile( fname );;
gap> ReadAll(stream);
"123"
gap> CloseStream(stream);
gap> stream;
closed-stream

# partial reads
gap> stream := InputTextFile( fname );;
gap> ReadAll(stream, 2);
"12"
gap> CloseStream(stream);
gap> stream;
closed-stream

# too long partial read
gap> stream := InputTextFile( fname );;
gap> ReadAll(stream, 5);
"123"
gap> CloseStream(stream);
gap> stream;
closed-stream

# error partial read
gap> stream := InputTextFile( fname );;
gap> ReadAll(stream, -1);
Error, ReadAll: negative limit is not allowed
gap> CloseStream(stream);
gap> stream;
closed-stream

# append to initial data
gap> stream := OutputTextFile( fname, true );;
gap> PrintTo( stream, "4");
gap> CloseStream(stream);

# verify it
gap> stream := InputTextFile( fname );;
gap> ReadAll(stream);
"1234"
gap> CloseStream(stream);
gap> stream;
closed-stream

# overwrite initial data
gap> stream := OutputTextFile( fname, false );;
gap> PrintTo( stream, "new content");
gap> CloseStream(stream);

# verify it
gap> stream := InputTextFile( fname );;
gap> ReadAll(stream);
"new content"
gap> CloseStream(stream);
gap> stream;
closed-stream

# ReadAll with length limit
gap> stream := InputTextFile( fname );;
gap> ReadAll(stream, 3);
"new"
gap> CloseStream(stream);

# test PrintFormattingStatus
gap> stream := OutputTextFile( fname, false );;
gap> PrintFormattingStatus(stream);
true
gap> PrintTo( stream, "a very long line that GAP is going to wrap at 80 chars by default if we don't do anything about it\n");
gap> CloseStream(stream);
gap> StringFile(fname);
"a very long line that GAP is going to wrap at 80 chars by default if we don't\
\\\ndo anything about it\n"
gap> stream := OutputTextFile( fname, false );;
gap> SetPrintFormattingStatus(stream, false);
gap> PrintFormattingStatus(stream);
false
gap> PrintTo( stream, "a very long line that GAP is going to wrap at 80 chars by default if we don't do anything about it\n");
gap> CloseStream(stream);
gap> StringFile(fname);
"a very long line that GAP is going to wrap at 80 chars by default if we don't\
do anything about it\n"

# Test even if a file ends in .gz, if it is not compressed it can still be read
gap> stream := InputTextFile(Filename(DirectoriesLibrary("tst"), "example-dir/compress/not-compressed.txt.gz"));;
gap> ReadAll(stream) = "not compressed\n";
true
gap> CloseStream(stream);
gap> STOP_TEST("compressed.tst");
4 changes: 0 additions & 4 deletions tst/testinstall/read.tst
Original file line number Diff line number Diff line change
Expand Up @@ -75,10 +75,6 @@ gap> StringFile( Filename(dir, "tmp2"));
fail
gap> StringFile( Filename(dir, "tmp1"));
"Hello, world!"
gap> FileString( Filename(dir, "test.g.gz"), "\037\213\b\b0,\362W\000\ctest.g\0003\3246\264\346\<\000\225\307\236\324\005\000\000\000" );
32
gap> StringFile( Filename(dir, "test.g") ) = "1+1;\n" or ARCH_IS_WINDOWS(); # works only when Cygwin installed with gzip
true
gap> StringFile( "/" );
Error, in StringFile: Is a directory (21)

Expand Down

4 comments on commit c69cda3

@mtorpey
Copy link
Contributor

@mtorpey mtorpey commented on c69cda3 Sep 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fingolfin and @ChrisJefferson: After a long git bisect, I think this is the commit that's causing PackageManager to fail on my computer (although apparently not on everyone else's).

When PackageManager downloads a .gz file, curlInterface reads it into a GAP string (PackageManager.gi:1009), then we write that string out using FileString (PackageManager.gi:172). The archive that gets written out used to be fine, but from this commit on (excluding this and the following 3 or 4 commits, which I can't compile) it seems it's mangled and can't be unzipped using tar on the command line.

I don't totally understand what this commit does, but is there some way I can read a gz file in its raw form to be written out again, without decompressing it? Any light on this would be nice.

Note: I suspect the real problem is happening in curlInterface somewhere, and that PackageManager is just showing up as a symptom.

@ChrisJefferson
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GAP now automatically uncompresses (and compresses) files ending in gz. The problem is when you save the file out using FileString, it ends up double-compressed.

Annoyingly, this might be tricky to fix -- you could save it with the IO package which doesn't do autocompression, or save the file with a different name, and then rename it to end .gz. However, neither of those are great.

@ChrisJefferson
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect people who find it works fine don't have curlinterface compiled.

@mtorpey
Copy link
Contributor

@mtorpey mtorpey commented on c69cda3 Sep 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see now! This sounds like I should be able to move a few bits around and fix things. Thanks for the info!

Please sign in to comment.