Skip to content

s2: non Indexed Seek may seek to the wrong offset #632

@JustinAzoff

Description

@JustinAzoff

Hi!

I have noticed an issue where the non-indexed Seek code may skip to the wrong offset depending on what parameters are used. I'm not quite sure what is wrong yet, but I was able to work out a test case that shows the issue well.

This writes 1,000,000 25 byte records, then attempts to seek through them using different skip lengths. The part that made this hard to track down is that for small skip sizes it works properly, but large sizes return invalid data, or try seeking too far and trigger unexpected EOF.

$ go test -v      
=== RUN   TestSeeking
=== RUN   TestSeeking/skip=100
=== RUN   TestSeeking/skip=1000
=== RUN   TestSeeking/skip=10000
=== RUN   TestSeeking/skip=20000
=== RUN   TestSeeking/skip=100000
    main_test.go:64: Expected "Item 0000000000000200000\n", got "tem 0000000000000241943\nI"
=== RUN   TestSeeking/skip=200000
    main_test.go:64: Expected "Item 0000000000000400000\n", got "m 0000000000000525829\nIte"
=== RUN   TestSeeking/skip=400000
    main_test.go:56: Failed to seek: unexpected EOF
--- FAIL: TestSeeking (0.21s)
    --- PASS: TestSeeking/skip=100 (0.01s)
    --- PASS: TestSeeking/skip=1000 (0.01s)
    --- PASS: TestSeeking/skip=10000 (0.01s)
    --- PASS: TestSeeking/skip=20000 (0.01s)
    --- FAIL: TestSeeking/skip=100000 (0.00s)
    --- FAIL: TestSeeking/skip=200000 (0.00s)
    --- FAIL: TestSeeking/skip=400000 (0.00s)
import (
        "fmt"
        "io"
        "os"
        "path/filepath"
        "testing"

        "github.com/klauspost/compress/s2"
)

func TestSeeking(t *testing.T) {
        d := t.TempDir()
        fn := filepath.Join(d, "test.s2")

        f, err := os.Create(fn)
        if err != nil {
                t.Fatal(err)
        }

        enc := s2.NewWriter(f)

        //24 bytes per item plus \n = 25 bytes per record
        for i := 0; i < 1_000_000; i++ {
                fmt.Fprintf(enc, "Item %019d\n", i)
        }

        err = enc.Close()
        if err != nil {
                t.Fatal(err)
        }

        err = f.Close()
        if err != nil {
                t.Fatal(err)
        }

        for _, skip := range []int{100, 1_000, 10_000, 20_000, 100_000, 200_000, 400_000} {
                t.Run(fmt.Sprintf("skip=%d", skip), func(t *testing.T) {
                        f, err = os.Open(fn)
                        if err != nil {
                                t.Fatal(err)
                        }
                        defer f.Close()
                        dec := s2.NewReader(f)
                        seeker := s2.ReadSeeker{
                                Reader: dec,
                        }
                        buf := make([]byte, 25)
                        for rec := 0; rec < 1_000_000; rec += skip {
                                offset := int64(rec * 25)
                                //t.Logf("Reading record %d", rec)
                                _, err := seeker.Seek(offset, io.SeekStart)
                                if err != nil {
                                        t.Fatalf("Failed to seek: %v", err)
                                }
                                _, err = dec.Read(buf)
                                if err != nil {
                                        t.Fatalf("Failed to seek: %v", err)
                                }
                                expected := fmt.Sprintf("Item %019d\n", rec)
                                if string(buf) != expected {
                                        t.Fatalf("Expected %q, got %q", expected, buf)
                                }
                        }
                })
        }

}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions