-
Notifications
You must be signed in to change notification settings - Fork 358
Fix tied trial bug in PercentileESS: Use rank() for n_best_trial protection #4587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ltiao
added a commit
to ltiao/Ax
that referenced
this pull request
Nov 26, 2025
…ection (facebook#4587) Summary: This diff refactors the n_best_trial protection logic in `PercentileEarlyStoppingStrategy` to use rank-based selection instead of head-based selection, fixing a bug with tied trials and enabling easier extension with vectorized calculations. **Changes:** * **Fixed bug with tied trial protection (Replaced `sort_values().head()` with `rank(method='dense')`)** - **Previous bug**: When multiple trials had tied objective values among the top K, only the first K trials (based on DataFrame ordering) were protected. Other trials with identical performance could be incorrectly stopped. - **Fix**: Using `rank(method='dense')` assigns the same rank to all tied values, ensuring ALL trials at or above the cutoff rank are protected, regardless of original ordering. - **Additional benefit**: Rank-based approach is better suited for vectorized operations. * **Moved n_best_trial check before percentile check** - Previously: Protection check occurred only after trial failed percentile threshold - Now: Protection check happens first, short-circuiting the percentile calculation if trial is protected - Benefits: Avoids unnecessary percentile computation for protected trials * **Simplified protection logic and messaging** - Removed verbose details about worst trial value and all top trial values from log message - Cleaner implementation with rank-based selection enables easier future extensions: see next diff in stack, which applies logic across a window of progressions rather than a single progression **Technical Notes:** The `rank(method='dense')` approach assigns the same rank to tied values and ensures continuous ranking without gaps. For example, with values [1.0, 1.0, 1.0, 2.0, 3.0] and minimize=True, all three 1.0s get rank=1, 2.0 gets rank=2, and 3.0 gets rank=3. When filtering for `rank <= n_best_trials_to_complete`, ALL trials with qualifying ranks are included, ensuring fair treatment of tied performers. Differential Revision: D87843900
2a89b5c to
17f26a8
Compare
… Naming (facebook#4578) Summary: **Context:** This diff refactors the early stopping strategies codebase to eliminate code duplication and improve code clarity through better naming conventions. - Eliminated ~50 lines of duplicate data preparation logic - Established single source of truth for data preparation in base class - Improved code maintainability and consistency - No functional changes - purely refactoring **Changes:** 1. **Moved `_prepare_aligned_data()` to base class** (`BaseEarlyStoppingStrategy`) * Previously duplicated in `PercentileEarlyStoppingStrategy` and `MultiObjectiveEarlyStoppingStrategy` (and also in future concrete implements of `_is_harmful`). Now a reusable helper method available to all strategies 2. **Renamed method for clarity** * `_check_validity_and_get_data()` → `_lookup_and_validate_data()` (emphasizes "lookup" terminology consistent with Ax codebase conventions; more accurately describes the method's purpose) 3. **Improved parameter naming across all strategies** * `df`, `df_raw` → `wide_df`, `long_df` * Clearly distinguishes between wide and long format dataframes * Updated in all early stopping strategy implementations and tests 4. **Updated documentation** Reviewed By: saitcakmak Differential Revision: D87573286
Summary: This diff adds a `check_safe` boolean parameter to `BaseEarlyStoppingStrategy` and all its child classes to control whether the `_is_harmful` safety check is applied when making early stopping decisions. When `check_safe=False` (default), the safety check is bypassed and early stopping decisions from `_should_stop_trials_early` are applied directly. When `check_safe=True`, the `_is_harmful` check gates early stopping to prevent potentially harmful stopping decisions. The parameter is added to: - `BaseEarlyStoppingStrategy.__init__` - `ModelBasedEarlyStoppingStrategy.__init__` - `PercentileEarlyStoppingStrategy.__init__` - `ThresholdEarlyStoppingStrategy.__init__` All child classes default to `check_safe=False` to maintain backward compatibility while allowing opt in to safety checks as needed. Reviewed By: saitcakmak Differential Revision: D87492602
…ection (facebook#4587) Summary: This diff refactors the n_best_trial protection logic in `PercentileEarlyStoppingStrategy` to use rank-based selection instead of head-based selection, fixing a bug with tied trials and enabling easier extension with vectorized calculations. **Changes:** * **Fixed bug with tied trial protection (Replaced `sort_values().head()` with `rank(method='dense')`)** - **Previous bug**: When multiple trials had tied objective values among the top K, only the first K trials (based on DataFrame ordering) were protected. Other trials with identical performance could be incorrectly stopped. - **Fix**: Using `rank(method='dense')` assigns the same rank to all tied values, ensuring ALL trials at or above the cutoff rank are protected, regardless of original ordering. - **Additional benefit**: Rank-based approach is better suited for vectorized operations. * **Moved n_best_trial check before percentile check** - Previously: Protection check occurred only after trial failed percentile threshold - Now: Protection check happens first, short-circuiting the percentile calculation if trial is protected - Benefits: Avoids unnecessary percentile computation for protected trials * **Simplified protection logic and messaging** - Removed verbose details about worst trial value and all top trial values from log message - Cleaner implementation with rank-based selection enables easier future extensions: see next diff in stack, which applies logic across a window of progressions rather than a single progression **Technical Notes:** The `rank(method='dense')` approach assigns the same rank to tied values and ensures continuous ranking without gaps. For example, with values [1.0, 1.0, 1.0, 2.0, 3.0] and minimize=True, all three 1.0s get rank=1, 2.0 gets rank=2, and 3.0 gets rank=3. When filtering for `rank <= n_best_trials_to_complete`, ALL trials with qualifying ranks are included, ensuring fair treatment of tied performers. Differential Revision: D87843900
17f26a8 to
31ed494
Compare
ltiao
added a commit
to ltiao/Ax
that referenced
this pull request
Nov 26, 2025
…ection (facebook#4587) Summary: This diff refactors the n_best_trial protection logic in `PercentileEarlyStoppingStrategy` to use rank-based selection instead of head-based selection, fixing a bug with tied trials and enabling easier extension with vectorized calculations. **Changes:** * **Fixed bug with tied trial protection (Replaced `sort_values().head()` with `rank(method='dense')`)** - **Previous bug**: When multiple trials had tied objective values among the top K, only the first K trials (based on DataFrame ordering) were protected. Other trials with identical performance could be incorrectly stopped. - **Fix**: Using `rank(method='dense')` assigns the same rank to all tied values, ensuring ALL trials at or above the cutoff rank are protected, regardless of original ordering. - **Additional benefit**: Rank-based approach is better suited for vectorized operations. * **Moved n_best_trial check before percentile check** - Previously: Protection check occurred only after trial failed percentile threshold - Now: Protection check happens first, short-circuiting the percentile calculation if trial is protected - Benefits: Avoids unnecessary percentile computation for protected trials * **Simplified protection logic and messaging** - Removed verbose details about worst trial value and all top trial values from log message - Cleaner implementation with rank-based selection enables easier future extensions: see next diff in stack, which applies logic across a window of progressions rather than a single progression **Technical Notes:** The `rank(method='dense')` approach assigns the same rank to tied values and ensures continuous ranking without gaps. For example, with values [1.0, 1.0, 1.0, 2.0, 3.0] and minimize=True, all three 1.0s get rank=1, 2.0 gets rank=2, and 3.0 gets rank=3. When filtering for `rank <= n_best_trials_to_complete`, ALL trials with qualifying ranks are included, ensuring fair treatment of tied performers. Differential Revision: D87843900
ltiao
added a commit
to ltiao/Ax
that referenced
this pull request
Nov 26, 2025
…ection (facebook#4587) Summary: This diff refactors the n_best_trial protection logic in `PercentileEarlyStoppingStrategy` to use rank-based selection instead of head-based selection, fixing a bug with tied trials and enabling easier extension with vectorized calculations. **Changes:** * **Fixed bug with tied trial protection (Replaced `sort_values().head()` with `rank(method='dense')`)** - **Previous bug**: When multiple trials had tied objective values among the top K, only the first K trials (based on DataFrame ordering) were protected. Other trials with identical performance could be incorrectly stopped. - **Fix**: Using `rank(method='dense')` assigns the same rank to all tied values, ensuring ALL trials at or above the cutoff rank are protected, regardless of original ordering. - **Additional benefit**: Rank-based approach is better suited for vectorized operations. * **Moved n_best_trial check before percentile check** - Previously: Protection check occurred only after trial failed percentile threshold - Now: Protection check happens first, short-circuiting the percentile calculation if trial is protected - Benefits: Avoids unnecessary percentile computation for protected trials * **Simplified protection logic and messaging** - Removed verbose details about worst trial value and all top trial values from log message - Cleaner implementation with rank-based selection enables easier future extensions: see next diff in stack, which applies logic across a window of progressions rather than a single progression **Technical Notes:** The `rank(method='dense')` approach assigns the same rank to tied values and ensures continuous ranking without gaps. For example, with values [1.0, 1.0, 1.0, 2.0, 3.0] and minimize=True, all three 1.0s get rank=1, 2.0 gets rank=2, and 3.0 gets rank=3. When filtering for `rank <= n_best_trials_to_complete`, ALL trials with qualifying ranks are included, ensuring fair treatment of tied performers. Reviewed By: dme65 Differential Revision: D87843900
|
This pull request has been merged in 1cdab6d. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
Do not delete this pull request or issue due to inactivity.
fb-exported
Merged
meta-exported
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
This diff refactors the n_best_trial protection logic in
PercentileEarlyStoppingStrategyto use rank-based selection instead of head-based selection, fixing a bug with tied trials and enabling easier extension with vectorized calculations.Changes:
sort_values().head()withrank(method='dense'))rank(method='dense')assigns the same rank to all tied values, ensuring ALL trials at or above the cutoff rank are protected, regardless of original ordering.Technical Notes:
The
rank(method='dense')approach assigns the same rank to tied values and ensures continuous ranking without gaps. For example, with values [1.0, 1.0, 1.0, 2.0, 3.0] and minimize=True, all three 1.0s get rank=1, 2.0 gets rank=2, and 3.0 gets rank=3. When filtering forrank <= n_best_trials_to_complete, ALL trials with qualifying ranks are included, ensuring fair treatment of tied performers.Differential Revision: D87843900