Skip to content

[Feat] Implement ReservoirLongsSketch for sampling package #90

@Fengzdadi

Description

@Fengzdadi

Description

I'd like to contribute an implementation of ReservoirLongsSketch to the Go library. This would address the ❌ status for "ReservoirLongsSketch" in the README.

Proposed Implementation

Based on the Java reference implementation, I've created:

File Description
sampling/reservoir_longs_sketch.go Core reservoir sampling for int64 values
sampling/reservoir_longs_union.go Union for merging multiple sketches
sampling/reservoir_longs_sketch_test.go Unit tests (11 tests)
examples/reservoir_example_test.go Usage examples

Algorithm

The classic Reservoir Sampling algorithm (Vitter's Algorithm R):

  1. Initial Phase (n < k): Store all items
  2. Steady State (n ≥ k): Replace random item with probability k/n

API

// Create sketch with capacity k
sketch, _ := sampling.NewReservoirLongsSketch(10)

// Add items
sketch.Update(42)

// Get uniform random sample
samples := sketch.GetSamples()

// Serialization
bytes, _ := sketch.ToByteArray()
restored, _ := sampling.NewReservoirLongsSketchFromSlice(bytes)

Feedback Requested

I have a working implementation ready. Before submitting a PR, I'd appreciate feedback on:

  1. Serialization Format: I followed the general pattern from PreambleUtil.java. Should I verify cross-language compatibility with specific test cases?
  2. Scope: Should I include ReservoirItemsSketch<T> (generic version) in the same PR, or keep it as a separate contribution?
  3. Any design concerns with the current approach?

Testing

All tests pass locally:

go test -v ./sampling/ ./examples/
# 13 tests pass

I'm happy to adjust the implementation based on your feedback!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions