Skip to content

Conversation

@louiswang524
Copy link
Contributor

@louiswang524 louiswang524 commented Dec 30, 2025

Replace sequential tokenization with batch processing using HuggingFace's native batch API. This change significantly improves tokenization performance, especially for larger batch sizes.

Key changes:

  • Use tokenizer(prompts, padding=False) for batch processing
  • Pre-process all messages (chat templates and plain text) before tokenization
  • Convert batch results to individual int32 tensors
======================================================================
Batch Tokenization Performance Benchmark
======================================================================

Loading tokenizer...
Tokenizer loaded.

Running benchmarks (100 iterations per batch size)...

Batch Size   Sequential (s)   Batch (s)        Speedup     
----------------------------------------------------------------------
1            0.0130           0.0074           1.75        x
2            0.0333           0.0264           1.26        x
4            0.0640           0.0369           1.73        x
8            0.1022           0.0545           1.88        x
16           0.2353           0.0833           2.82        x
32           0.5041           0.1506           3.35        x
64           0.9826           0.2041           4.82        x
----------------------------------------------------------------------

Summary:
  - Best speedup: 4.82x (batch size 64)
  - Average speedup: 2.51x

Throughput (messages/second):
Batch Size   Sequential       Batch            Improvement 
----------------------------------------------------------------------
1            7696.7           13447.7          74.7        %
2            6001.2           7562.5           26.0        %
4            6253.3           10833.7          73.2        %
8            7824.3           14690.7          87.8        %
16           6801.2           19207.1          182.4       %
32           6348.1           21242.1          234.6       %
64           6513.5           31364.2          381.5       %
----------------------------------------------------------------------

Replace sequential tokenization with batch processing using HuggingFace's
native batch API. This change significantly improves tokenization performance,
especially for larger batch sizes.

Key changes:
- Use tokenizer(prompts, padding=False) for batch processing
- Pre-process all messages (chat templates and plain text) before tokenization
- Convert batch results to individual int32 tensors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant