Skip to content

Conversation

@ltiao
Copy link
Contributor

@ltiao ltiao commented Nov 25, 2025

Summary:
This diff refactors the n_best_trial protection logic in PercentileEarlyStoppingStrategy to use rank-based selection instead of head-based selection, fixing a bug with tied trials and enabling easier extension with vectorized calculations.

Changes:

  • Fixed bug with tied trial protection (Replaced sort_values().head() with rank(method='dense'))
    • Previous bug: When multiple trials had tied objective values among the top K, only the first K trials (based on DataFrame ordering) were protected. Other trials with identical performance could be incorrectly stopped.
    • Fix: Using rank(method='dense') assigns the same rank to all tied values, ensuring ALL trials at or above the cutoff rank are protected, regardless of original ordering.
    • Additional benefit: Rank-based approach is better suited for vectorized operations.
  • Moved n_best_trial check before percentile check
    • Previously: Protection check occurred only after trial failed percentile threshold
    • Now: Protection check happens first, short-circuiting the percentile calculation if trial is protected
    • Benefits: Avoids unnecessary percentile computation for protected trials
  • Simplified protection logic and messaging
    • Removed verbose details about worst trial value and all top trial values from log message
    • Cleaner implementation with rank-based selection enables easier future extensions: see next diff in stack, which applies logic across a window of progressions rather than a single progression

Technical Notes:

The rank(method='dense') approach assigns the same rank to tied values and ensures continuous ranking without gaps. For example, with values [1.0, 1.0, 1.0, 2.0, 3.0] and minimize=True, all three 1.0s get rank=1, 2.0 gets rank=2, and 3.0 gets rank=3. When filtering for rank <= n_best_trials_to_complete, ALL trials with qualifying ranks are included, ensuring fair treatment of tied performers.

Differential Revision: D87843900

@meta-cla meta-cla bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Nov 25, 2025
@meta-codesync
Copy link

meta-codesync bot commented Nov 25, 2025

@ltiao has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87843900.

ltiao added a commit to ltiao/Ax that referenced this pull request Nov 26, 2025
…ection (facebook#4587)

Summary:

This diff refactors the n_best_trial protection logic in `PercentileEarlyStoppingStrategy` to use rank-based selection instead of head-based selection, fixing a bug with tied trials and enabling easier extension with vectorized calculations.

**Changes:**

* **Fixed bug with tied trial protection (Replaced `sort_values().head()` with `rank(method='dense')`)**
   - **Previous bug**: When multiple trials had tied objective values among the top K, only the first K trials (based on DataFrame ordering) were protected. Other trials with identical performance could be incorrectly stopped.
   - **Fix**: Using `rank(method='dense')` assigns the same rank to all tied values, ensuring ALL trials at or above the cutoff rank are protected, regardless of original ordering.
   - **Additional benefit**: Rank-based approach is better suited for vectorized operations.
* **Moved n_best_trial check before percentile check**
   - Previously: Protection check occurred only after trial failed percentile threshold
   - Now: Protection check happens first, short-circuiting the percentile calculation if trial is protected
   - Benefits: Avoids unnecessary percentile computation for protected trials
* **Simplified protection logic and messaging**
   - Removed verbose details about worst trial value and all top trial values from log message
   - Cleaner implementation with rank-based selection enables easier future extensions: see next diff in stack, which applies logic across a window of progressions rather than a single progression


**Technical Notes:**

The `rank(method='dense')` approach assigns the same rank to tied values and ensures continuous ranking without gaps. For example, with values [1.0, 1.0, 1.0, 2.0, 3.0] and minimize=True, all three 1.0s get rank=1, 2.0 gets rank=2, and 3.0 gets rank=3. When filtering for `rank <= n_best_trials_to_complete`, ALL trials with qualifying ranks are included, ensuring fair treatment of tied performers.

Differential Revision: D87843900
… Naming (facebook#4578)

Summary:

**Context:**

This diff refactors the early stopping strategies codebase to eliminate code duplication and improve code clarity through better naming conventions.

- Eliminated ~50 lines of duplicate data preparation logic
- Established single source of truth for data preparation in base class
- Improved code maintainability and consistency
- No functional changes - purely refactoring

**Changes:**

1. **Moved `_prepare_aligned_data()` to base class** (`BaseEarlyStoppingStrategy`)
   * Previously duplicated in `PercentileEarlyStoppingStrategy` and `MultiObjectiveEarlyStoppingStrategy` (and also in future concrete implements of `_is_harmful`). Now a reusable helper method available to all strategies
2. **Renamed method for clarity**
   * `_check_validity_and_get_data()` → `_lookup_and_validate_data()` (emphasizes "lookup" terminology consistent with Ax codebase conventions; more accurately describes the method's purpose)
3. **Improved parameter naming across all strategies**
   * `df`, `df_raw` → `wide_df`, `long_df`
   * Clearly distinguishes between wide and long format dataframes
   * Updated in all early stopping strategy implementations and tests
4. **Updated documentation**

Reviewed By: saitcakmak

Differential Revision: D87573286
Summary:

This diff adds a `check_safe` boolean parameter to `BaseEarlyStoppingStrategy` and all its child classes to control whether the `_is_harmful` safety check is applied when making early stopping decisions.

When `check_safe=False` (default), the safety check is bypassed and early stopping decisions from `_should_stop_trials_early` are applied directly. When `check_safe=True`, the `_is_harmful` check gates early stopping to prevent potentially harmful stopping decisions.

The parameter is added to:
- `BaseEarlyStoppingStrategy.__init__`
- `ModelBasedEarlyStoppingStrategy.__init__`
- `PercentileEarlyStoppingStrategy.__init__`
- `ThresholdEarlyStoppingStrategy.__init__`

All child classes default to `check_safe=False` to maintain backward compatibility while allowing opt in to safety checks as needed.

Reviewed By: saitcakmak

Differential Revision: D87492602
…ection (facebook#4587)

Summary:

This diff refactors the n_best_trial protection logic in `PercentileEarlyStoppingStrategy` to use rank-based selection instead of head-based selection, fixing a bug with tied trials and enabling easier extension with vectorized calculations.

**Changes:**

* **Fixed bug with tied trial protection (Replaced `sort_values().head()` with `rank(method='dense')`)**
   - **Previous bug**: When multiple trials had tied objective values among the top K, only the first K trials (based on DataFrame ordering) were protected. Other trials with identical performance could be incorrectly stopped.
   - **Fix**: Using `rank(method='dense')` assigns the same rank to all tied values, ensuring ALL trials at or above the cutoff rank are protected, regardless of original ordering.
   - **Additional benefit**: Rank-based approach is better suited for vectorized operations.
* **Moved n_best_trial check before percentile check**
   - Previously: Protection check occurred only after trial failed percentile threshold
   - Now: Protection check happens first, short-circuiting the percentile calculation if trial is protected
   - Benefits: Avoids unnecessary percentile computation for protected trials
* **Simplified protection logic and messaging**
   - Removed verbose details about worst trial value and all top trial values from log message
   - Cleaner implementation with rank-based selection enables easier future extensions: see next diff in stack, which applies logic across a window of progressions rather than a single progression


**Technical Notes:**

The `rank(method='dense')` approach assigns the same rank to tied values and ensures continuous ranking without gaps. For example, with values [1.0, 1.0, 1.0, 2.0, 3.0] and minimize=True, all three 1.0s get rank=1, 2.0 gets rank=2, and 3.0 gets rank=3. When filtering for `rank <= n_best_trials_to_complete`, ALL trials with qualifying ranks are included, ensuring fair treatment of tied performers.

Differential Revision: D87843900
ltiao added a commit to ltiao/Ax that referenced this pull request Nov 26, 2025
…ection (facebook#4587)

Summary:

This diff refactors the n_best_trial protection logic in `PercentileEarlyStoppingStrategy` to use rank-based selection instead of head-based selection, fixing a bug with tied trials and enabling easier extension with vectorized calculations.

**Changes:**

* **Fixed bug with tied trial protection (Replaced `sort_values().head()` with `rank(method='dense')`)**
   - **Previous bug**: When multiple trials had tied objective values among the top K, only the first K trials (based on DataFrame ordering) were protected. Other trials with identical performance could be incorrectly stopped.
   - **Fix**: Using `rank(method='dense')` assigns the same rank to all tied values, ensuring ALL trials at or above the cutoff rank are protected, regardless of original ordering.
   - **Additional benefit**: Rank-based approach is better suited for vectorized operations.
* **Moved n_best_trial check before percentile check**
   - Previously: Protection check occurred only after trial failed percentile threshold
   - Now: Protection check happens first, short-circuiting the percentile calculation if trial is protected
   - Benefits: Avoids unnecessary percentile computation for protected trials
* **Simplified protection logic and messaging**
   - Removed verbose details about worst trial value and all top trial values from log message
   - Cleaner implementation with rank-based selection enables easier future extensions: see next diff in stack, which applies logic across a window of progressions rather than a single progression


**Technical Notes:**

The `rank(method='dense')` approach assigns the same rank to tied values and ensures continuous ranking without gaps. For example, with values [1.0, 1.0, 1.0, 2.0, 3.0] and minimize=True, all three 1.0s get rank=1, 2.0 gets rank=2, and 3.0 gets rank=3. When filtering for `rank <= n_best_trials_to_complete`, ALL trials with qualifying ranks are included, ensuring fair treatment of tied performers.

Differential Revision: D87843900
ltiao added a commit to ltiao/Ax that referenced this pull request Nov 26, 2025
…ection (facebook#4587)

Summary:

This diff refactors the n_best_trial protection logic in `PercentileEarlyStoppingStrategy` to use rank-based selection instead of head-based selection, fixing a bug with tied trials and enabling easier extension with vectorized calculations.

**Changes:**

* **Fixed bug with tied trial protection (Replaced `sort_values().head()` with `rank(method='dense')`)**
   - **Previous bug**: When multiple trials had tied objective values among the top K, only the first K trials (based on DataFrame ordering) were protected. Other trials with identical performance could be incorrectly stopped.
   - **Fix**: Using `rank(method='dense')` assigns the same rank to all tied values, ensuring ALL trials at or above the cutoff rank are protected, regardless of original ordering.
   - **Additional benefit**: Rank-based approach is better suited for vectorized operations.
* **Moved n_best_trial check before percentile check**
   - Previously: Protection check occurred only after trial failed percentile threshold
   - Now: Protection check happens first, short-circuiting the percentile calculation if trial is protected
   - Benefits: Avoids unnecessary percentile computation for protected trials
* **Simplified protection logic and messaging**
   - Removed verbose details about worst trial value and all top trial values from log message
   - Cleaner implementation with rank-based selection enables easier future extensions: see next diff in stack, which applies logic across a window of progressions rather than a single progression


**Technical Notes:**

The `rank(method='dense')` approach assigns the same rank to tied values and ensures continuous ranking without gaps. For example, with values [1.0, 1.0, 1.0, 2.0, 3.0] and minimize=True, all three 1.0s get rank=1, 2.0 gets rank=2, and 3.0 gets rank=3. When filtering for `rank <= n_best_trials_to_complete`, ALL trials with qualifying ranks are included, ensuring fair treatment of tied performers.

Reviewed By: dme65

Differential Revision: D87843900
@meta-codesync meta-codesync bot closed this in 1cdab6d Nov 26, 2025
@meta-codesync
Copy link

meta-codesync bot commented Nov 26, 2025

This pull request has been merged in 1cdab6d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported Merged meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants