Skip to content

Propose a new picker WeightedRandomPicker to mitigate hot-spotting issue of max-score-picker #1411

@Jooho

Description

@Jooho

What would you like to be added:
Add WeightedRandomPicker as a new picker plugin alongside the existing max-score-picker to provide
intelligent load balancing through weighted random sampling.

Core Features:

  • Weighted Random Sampling: Distributes traffic probabilistically based on pod scores instead of
    always selecting the highest-scoring pod
  • Optional Score Normalization: Four normalization types (None, Square Root, Capping, Logarithmic)
    for enhanced load distribution when needed
  • Flexible API: Variadic parameters supporting simple usage (NewWeightedRandomPicker(5)) to
    advanced configuration (NewWeightedRandomPicker(5, NormalizationCapping, 2.5))
  • YAML Configuration Support: Standard picker configuration format with optional normalization
    parameters

Implementation:

  • New weighted-random-picker type in the scheduler framework
  • Progressive normalization options for extreme score variations

Why is this needed:

Critical Problem: Hot-spotting with max-score-picker

llm-d/llm-d-inference-scheduler#298

max-score-picker scheduler, which creates severe production issues in high-traffic environments:

Current Behavior (Problematic):

High-traffic scenario with max-score-picker:
Pod Scores: Pod A=100, Pod B=20, Pod C=10

Traffic Distribution:
Pod A: 100% traffic → Queue length: 25-30
Pod B: 0% traffic   → Completely idle    → Wasted resources
Pod C: 0% traffic   → Completely idle    → Wasted resources

Result: System bottleneck at single "best" pod

Business Impact:

  1. Performance Degradation: Queue buildup causes 3-5x latency increases
  2. Resource Waste: High-quality pods sit idle while one pod is overwhelmed
  3. Poor Scalability: Adding more pods doesn't help since traffic still goes to one pod
  4. Unreliable Service: Single point of failure at the highest-scoring pod

Solution Impact with WeightedRandomPicker:

Same scenario with weighted-random-picker:
Pod Scores: Pod A=100, Pod B=20, Pod C=10

Traffic Distribution:
Pod A: 77% traffic → Queue length: 15-20 
Pod B: 15% traffic → Queue length: 2-3   → Actually utilized
Pod C: 8% traffic  → Queue length: 1     → Actually utilized

Result: Better performance + resource utilization + system resilience

Why Weighted Random Sampling:

  • Maintains Intelligence: Still prefers higher-scoring pods (77% vs 15% vs 8%)
  • Prevents Hot-spotting: No single pod gets 100% traffic
  • Improves Throughput: Utilizes cluster capacity instead of creating idle resources
  • Zero Overhead: Basic weighted sampling has same performance as max-score-picker

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions