Skip to content

Add inference-time sequence packing support#292

Merged
Ingvarstep merged 9 commits intourchade:mainfrom
vivekkalyanarangan30:main
Oct 9, 2025
Merged

Add inference-time sequence packing support#292
Ingvarstep merged 9 commits intourchade:mainfrom
vivekkalyanarangan30:main

Conversation

@vivekkalyanarangan30
Copy link
Contributor

@vivekkalyanarangan30 vivekkalyanarangan30 commented Sep 16, 2025

Summary

  • Add a new inference-packing utility that builds packed batches with block-diagonal masks and helpers to unpack outputs.
  • Expose configuration knobs on the GLiNER API so packing can be toggled globally or per-call.
  • Wire the encoder to use packed execution when configured, including automatic pack/unpack around the transformer forward pass.
  • Results are identical with or without packing (verified by tests).

Benchmarks

All runs on CPU, roberta-base, batch_size=64, max_length=512.

Scenario Baseline tokens/s Packed tokens/s Speedup Padding ↓
short_zipf 2.00e+03 3.88e+03 1.94× 61.5% → 12.0%
short_uniform 2.44e+03 3.47e+03 1.42× 45.9% → 13.4%
mixed_tail 6.18e+02 3.43e+03 5.55× 87.4% → 19.6%
flat_long 4.94e+03 4.10e+03 0.83× 0.0% → 0.0%

👉 Packing yields 1.4–5.5× throughput improvements when input lengths are short or skewed, while performance is neutral (or slightly worse) when all sequences are long and uniform.

  • document how to enable and benchmark inference-time sequence packing in README_Extended
  • bench/bench_gliner_e2e.py to support full GLiNER models benchmarking

@vivekkalyanarangan30
Copy link
Contributor Author

@urchade hopefully this helps.

@urchade
Copy link
Owner

urchade commented Sep 26, 2025

@Ingvarstep 👀

@urchade urchade requested a review from Ingvarstep September 26, 2025 11:51
@Ingvarstep
Copy link
Collaborator

@vivekkalyanarangan30 , awesome job, thanks for contributing!

@Ingvarstep Ingvarstep merged commit 4518525 into urchade:main Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants