⚡️ Speed up method Categorical._categories_match_up_to_permutation by 331%
#104
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 331% (3.31x) speedup for
Categorical._categories_match_up_to_permutationinpandas/core/arrays/categorical.py⏱️ Runtime :
252 microseconds→58.5 microseconds(best of6runs)📝 Explanation and details
The optimization replaces hash-based comparison (
hash(self.dtype) == hash(other.dtype)) with direct equality comparison (self.dtype == other.dtype) in the_categories_match_up_to_permutationmethod.Key optimization: Instead of computing two separate hash values and comparing them, the code now directly calls
CategoricalDtype.__eq__(), which contains optimized fast-path checks including identity comparison (other is self), null category handling, and efficient category comparison logic.Why this is faster: Hash computation for
CategoricalDtypeobjects involves processing categories and ordered flags, which is more expensive than the equality method's optimized comparison logic. The__eq__method can short-circuit early for common cases (identity, mismatched types) and uses efficient comparison strategies for different category configurations.Performance impact: The line profiler shows a 66% reduction in execution time per call (from 28,825.2ns to 9,723.3ns per hit), resulting in a 331% overall speedup. This optimization is particularly effective for test cases involving frequent categorical dtype comparisons, as the equality check avoids redundant hash calculations while maintaining identical semantic behavior.
The change preserves all original functionality since
CategoricalDtype.__hash__and__eq__are designed to be consistent - equal objects have equal hashes, making this substitution semantically equivalent but computationally more efficient.✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
arrays/categorical/test_dtypes.py::TestCategoricalDtypes.test_categories_match_up_to_permutationTo edit these changes
git checkout codeflash/optimize-Categorical._categories_match_up_to_permutation-mhbylp4wand push.