⚡️ Speed up method DataIndexableCol.get_atom_data by 12%
#108
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 12% (0.12x) speedup for
DataIndexableCol.get_atom_datainpandas/io/pytables.py⏱️ Runtime :
2.42 milliseconds→2.16 milliseconds(best of24runs)📝 Explanation and details
The optimization adds
@lru_cache(maxsize=64)to theget_atom_coltypemethod, which provides an 11% overall speedup by caching expensive attribute lookups.Key optimization:
getattr(_tables(), col_name)call accounts for 97.3% of the original runtime according to profiling@lru_cachecaches the result ofget_atom_coltypefor each uniquekindstring, eliminating redundantgetattrlookups on subsequent calls with the same kindfrom functools import lru_cacheimport to enable this cachingWhy this works:
_tables()function returns a module object, andgetattr()on modules involves attribute resolution and potential string processing overheadkind(e.g., "int64", "float32") are immutable and frequently reused in data processing workflowsTest results show consistent improvements:
This optimization is particularly effective for workloads that repeatedly create columns of the same data types, which is common in data processing pipelines.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-DataIndexableCol.get_atom_data-mhc4hbjyand push.