Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s2: Add stream index #462

Merged
merged 14 commits into from
Jan 11, 2022
Merged

s2: Add stream index #462

merged 14 commits into from
Jan 11, 2022

Conversation

klauspost
Copy link
Owner

@klauspost klauspost commented Jan 5, 2022

Stream Seek Index

S2 and Snappy streams can have indexes. These indexes will allow random seeking within the compressed data.

The index can either be appended to the stream as a skippable block or returned for separate storage.

When the index is appended to a stream it will be skipped by regular decoders,
so the output remains compatible with other decoders.

Creating an Index

To automatically add an index to a stream, add WriterAddIndex() option to your writer.
Then the index will be added to the stream when Close() is called.

    // Add Index to stream...
	enc := s2.NewWriter(w, s2.WriterAddIndex())
	io.Copy(enc, r)
	enc.Close()

If you want to store the index separately, you can use CloseIndex() instead of the regular Close().
This will return the index. Note that CloseIndex() should only be called once, and you shouldn't call Close().

    // Get index for separate storage... 
	enc := s2.NewWriter(w)
	io.Copy(enc, r)
	index, err := enc.CloseIndex()

The index can then be used needing to read from the stream.
This means the index can be used without needing to seek to the end of the stream
or for manually forwarding streams. See below.

Using Indexes

To use indexes there is a ReadSeeker(random bool, index []byte) (*ReadSeeker, error) function available.

Calling ReadSeeker will return an io.ReadSeeker compatible version of the reader.

If 'random' is specified the returned io.Seeker can be used for random seeking, otherwise only forward seeking is supported.
Enabling random seeking requires the original input to support the io.Seeker interface.

	dec := s2.NewReader(r)
	rs, err := dec.ReadSeeker(false, nil)
	rs.Seek(wantOffset, io.SeekStart)	

Get a seeker to seek forward. Since no index is provided, the index is read from the stream.
This requires that an index was added and that r supports the io.Seeker interface.

A custom index can be specified which will be used if supplied.
When using a custom index, it will not be read from the input stream.

	dec := s2.NewReader(r)
	rs, err := dec.ReadSeeker(false, index)
	rs.Seek(wantOffset, io.SeekStart)	

This will read the index from index. Since we specify non-random (forward only) seeking r does not have to be an io.Seeker

	dec := s2.NewReader(r)
	rs, err := dec.ReadSeeker(true, index)
	rs.Seek(wantOffset, io.SeekStart)	

Finally, since we specify that we want to do random seeking r must be an io.Seeker.

The returned ReadSeeker contains a shallow reference to the existing Reader,
meaning changes performed to one is reflected in the other.

Manually Forwarding Streams

Indexes can also be read outside the decoder using the Index type.
This can be used for parsing indexes, either separate or in streams.

In some cases it may not be possible to serve a seekable stream.
This can for instance be an HTTP stream, where the Range request
is sent at the start of the stream.

With a little bit of extra code it is still possible to forward

It is possible to load the index manually like this:

	var index s2.Index
	_, err = index.Load(idxBytes)

This can be used to figure out how much to offset the compressed stream:

	compressedOffset, uncompressedOffset, err := index.Find(wantOffset)

The compressedOffset is the number of bytes that should be skipped
from the beginning of the compressed file.

The uncompressedOffset will then be offset of the uncompressed bytes returned
when decoding from that position. This will always be <= wantOffset.

When creating a decoder it must be specified that it should not expect a frame header
at the beginning of the stream. Assuming the io.Reader r has been forwarded to compressedOffset
we create the decoder like this:

	dec := s2.NewReader(r, s2.ReaderIgnoreFrameHeader())

We are not completely done. We still need to forward the stream the uncompressed bytes we didn't want.
This is done using the regular "Skip" function:

	err = dec.Skip(wantOffset - uncompressedOffset)

This will ensure that we are at exactly the offset we want, and reading from dec will start at the requested offset.

@klauspost klauspost marked this pull request as ready for review January 7, 2022 14:30
@klauspost klauspost merged commit 469ba13 into master Jan 11, 2022
@klauspost klauspost deleted the s2-add-stream-index branch January 11, 2022 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant