feat: Implement batch tokenization for improved throughput #55

louiswang524 · 2025-12-30T01:46:07Z

Replace sequential tokenization with batch processing using HuggingFace's native batch API. This change significantly improves tokenization performance, especially for larger batch sizes.

Key changes:

Use tokenizer(prompts, padding=False) for batch processing
Pre-process all messages (chat templates and plain text) before tokenization
Convert batch results to individual int32 tensors

======================================================================
Batch Tokenization Performance Benchmark
======================================================================

Loading tokenizer...
Tokenizer loaded.

Running benchmarks (100 iterations per batch size)...

Batch Size   Sequential (s)   Batch (s)        Speedup     
----------------------------------------------------------------------
1            0.0130           0.0074           1.75        x
2            0.0333           0.0264           1.26        x
4            0.0640           0.0369           1.73        x
8            0.1022           0.0545           1.88        x
16           0.2353           0.0833           2.82        x
32           0.5041           0.1506           3.35        x
64           0.9826           0.2041           4.82        x
----------------------------------------------------------------------

Summary:
  - Best speedup: 4.82x (batch size 64)
  - Average speedup: 2.51x

Throughput (messages/second):
Batch Size   Sequential       Batch            Improvement 
----------------------------------------------------------------------
1            7696.7           13447.7          74.7        %
2            6001.2           7562.5           26.0        %
4            6253.3           10833.7          73.2        %
8            7824.3           14690.7          87.8        %
16           6801.2           19207.1          182.4       %
32           6348.1           21242.1          234.6       %
64           6513.5           31364.2          381.5       %
----------------------------------------------------------------------

Replace sequential tokenization with batch processing using HuggingFace's native batch API. This change significantly improves tokenization performance, especially for larger batch sizes. Key changes: - Use tokenizer(prompts, padding=False) for batch processing - Pre-process all messages (chat templates and plain text) before tokenization - Convert batch results to individual int32 tensors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Implement batch tokenization for improved throughput #55

feat: Implement batch tokenization for improved throughput #55

Uh oh!

louiswang524 commented Dec 30, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Implement batch tokenization for improved throughput #55

Are you sure you want to change the base?

feat: Implement batch tokenization for improved throughput #55

Uh oh!

Conversation

louiswang524 commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

louiswang524 commented Dec 30, 2025 •

edited

Loading