Skip to content

multi-threaded tokenizer benchmark to demonstrate Python 3.14t's free-threading performance improvements

AshutoshRudraksh/python-tokenizer-benchmark

Repository files navigation

Multi-Threaded Tokenizer Benchmark

Demonstrating Python 3.14t's Free-Threading Performance for LLM Preprocessing

This benchmark compares tokenization throughput across different thread counts, showing the dramatic speedup when Python's Global Interpreter Lock (GIL) is removed.

🎯 What This Demonstrates

  • Python 3.11 (with GIL): ~1x speedup regardless of thread count (GIL bottleneck)
  • Python 3.14t (no-GIL): 6-8x speedup on 8-core systems (true parallelism)

🚀 Quick Start

Run with Current Python (3.11 - with GIL)

python tokenizer_benchmark.py

Run with Python 3.14t (no-GIL) for Dramatic Speedup

# Install Python 3.14t using uv
uvx [email protected] tokenizer_benchmark.py

# Or download from python.org and run:
python3.14t tokenizer_benchmark.py

📊 What It Does

  1. Generates Dataset: Creates 10,000 synthetic text samples simulating real LLM preprocessing
  2. Tokenizes with tiktoken: Uses OpenAI's fast BPE tokenizer (cl100k_base encoding)
  3. Tests Multiple Thread Counts: Runs benchmarks with 1, 2, 4, 8, and 16 threads
  4. Measures Performance: Tracks tokens/sec, speedup ratios, and total time
  5. Creates Visualizations: Generates publication-ready charts for analysis
  6. Exports Results: Saves data to CSV and JSON for further analysis

📈 Output Files

  • benchmark_results.png - Visualization showing throughput and speedup curves
  • benchmark_results.csv - Detailed results in spreadsheet format
  • benchmark_results.json - Complete benchmark data in JSON format

🔧 Technical Details

Dependencies

  • tiktoken (0.12.0): Fast BPE tokenizer for LLM preprocessing
  • matplotlib: Visualization and plotting
  • pandas: Data analysis and CSV export
  • psutil: System information and CPU detection

Benchmark Methodology

  • Tests thread counts: 1, 2, 4, 8, 16 (adapts to available CPU cores)
  • Uses concurrent.futures.ThreadPoolExecutor for thread management
  • Measures wall-clock time with time.perf_counter()
  • Calculates speedup relative to single-threaded baseline

Why Tokenization?

Tokenization is a critical bottleneck in LLM preprocessing pipelines:

  • Required for every text sample before training/inference
  • CPU-intensive (no I/O waits)
  • Embarrassingly parallel (independent samples)
  • Perfect candidate for multi-threading

📝 LinkedIn Caption Template

The benchmark automatically generates a LinkedIn caption based on your results:

🚀 LLM preprocessing just got multi-core superpowers!

I benchmarked Python 3.14's free-threaded build (no-GIL) tokenizing 
10,000 text samples with tiktoken.

Results: 7.5x speedup on 8 threads! 
Peak throughput: 850,000 tokens/sec

The removal of the Global Interpreter Lock enables true parallel processing
for CPU-bound tasks like tokenization, preprocessing, and feature extraction.

This is a game-changer for ML/AI pipelines. The future of Python is parallel! 🐍⚡

#Python #MachineLearning #AI #LLM #Performance #GIL

🎓 Understanding the Results

With GIL (Python 3.11)

  • Adding more threads doesn't improve performance
  • GIL allows only one thread to execute Python bytecode at a time
  • Speedup stays close to 1.0x regardless of thread count

Without GIL (Python 3.14t)

  • Linear or near-linear speedup with thread count
  • True parallel execution across all CPU cores
  • 6-8x speedup on 8-core systems
  • Dramatic improvement for CPU-bound workloads

🔬 Customization

Edit tokenizer_benchmark.py to customize:

# Change number of samples
num_samples = 50000  # Default: 10000

# Change thread counts to test
thread_counts = [1, 2, 4, 8, 16, 32]  # Default: [1, 2, 4, 8, 16]

# Change tokenizer encoding
benchmark = TokenizerBenchmark(encoding_name="o200k_base")  # Default: cl100k_base

📚 About Python 3.14t

Python 3.14 was released on October 7, 2025, with official support for free-threaded builds (PEP 703, PEP 779).

Key Features:

  • Optional no-GIL build (indicated by 't' suffix: 3.14t)
  • True parallel execution on multi-core CPUs
  • 2-4x speedup for CPU-bound multi-threaded tasks
  • Uses biased reference counting for memory safety

Installation:

# Using uv (recommended)
uvx [email protected]

# Or download from python.org
https://www.python.org/downloads/

🤝 Contributing

Feel free to extend this benchmark:

  • Add more tokenizers (sentencepiece, rs-bpe, kitoken)
  • Test with real datasets (Wikipedia, code, multilingual text)
  • Add memory profiling
  • Measure CPU utilization per core

📄 License

This benchmark is provided as-is for educational and demonstration purposes.

About

multi-threaded tokenizer benchmark to demonstrate Python 3.14t's free-threading performance improvements

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages