Propose a new picker `WeightedRandomPicker` to mitigate hot-spotting issue of `max-score-picker`

**What would you like to be added**:
  Add WeightedRandomPicker as a new picker plugin alongside the existing max-score-picker to provide
   intelligent load balancing through weighted random sampling.

  Core Features:
  - Weighted Random Sampling: Distributes traffic probabilistically based on pod scores instead of
  always selecting the highest-scoring pod
  - Optional Score Normalization: Four normalization types (None, Square Root, Capping, Logarithmic)
   for enhanced load distribution when needed
  - Flexible API: Variadic parameters supporting simple usage (NewWeightedRandomPicker(5)) to
  advanced configuration (NewWeightedRandomPicker(5, NormalizationCapping, 2.5))
  - YAML Configuration Support: Standard picker configuration format with optional normalization
  parameters

  Implementation:
  - New weighted-random-picker type in the scheduler framework
  - Progressive normalization options for extreme score variations

**Why is this needed**:

*Critical Problem: Hot-spotting with max-score-picker*

  https://github.com/llm-d/llm-d-inference-scheduler/issues/298

  max-score-picker scheduler, which creates severe production issues in high-traffic environments:

  Current Behavior (Problematic):
  ```
  High-traffic scenario with max-score-picker:
  Pod Scores: Pod A=100, Pod B=20, Pod C=10

  Traffic Distribution:
  Pod A: 100% traffic → Queue length: 25-30
  Pod B: 0% traffic   → Completely idle    → Wasted resources
  Pod C: 0% traffic   → Completely idle    → Wasted resources

  Result: System bottleneck at single "best" pod
  ```
**Business Impact:**

  1. Performance Degradation: Queue buildup causes 3-5x latency increases
  2. Resource Waste: High-quality pods sit idle while one pod is overwhelmed
  3. Poor Scalability: Adding more pods doesn't help since traffic still goes to one pod
  4. Unreliable Service: Single point of failure at the highest-scoring pod

  Solution Impact with WeightedRandomPicker:
  ```
  Same scenario with weighted-random-picker:
  Pod Scores: Pod A=100, Pod B=20, Pod C=10

  Traffic Distribution:
  Pod A: 77% traffic → Queue length: 15-20 
  Pod B: 15% traffic → Queue length: 2-3   → Actually utilized
  Pod C: 8% traffic  → Queue length: 1     → Actually utilized
  
  Result: Better performance + resource utilization + system resilience
  ```

  **Why Weighted Random Sampling:**

  - Maintains Intelligence: Still prefers higher-scoring pods (77% vs 15% vs 8%)
  - Prevents Hot-spotting: No single pod gets 100% traffic
  - Improves Throughput: Utilizes cluster capacity instead of creating idle resources
  - Zero Overhead: Basic weighted sampling has same performance as max-score-picker

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Propose a new picker `WeightedRandomPicker` to mitigate hot-spotting issue of `max-score-picker` #1411

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Propose a new picker WeightedRandomPicker to mitigate hot-spotting issue of max-score-picker #1411

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Propose a new picker `WeightedRandomPicker` to mitigate hot-spotting issue of `max-score-picker` #1411