Skip to content

PERF: DataFrame.__setitem__ #44796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 11, 2021
Merged

Conversation

jbrockmendel
Copy link
Member

Looks good based on frame_methods benchmarks, will run a full set overnight.

asv continuous -E virtualenv -f 1.01 master HEAD -b frame_methods
[...]
       before           after         ratio
     [0b0cac5c]       [b21496b1]
     <bug-fillna-inplace>       <perf-isetitem>
-      30.0±0.9ms       26.8±0.2ms     0.89  frame_methods.Equals.time_frame_nonunique_unequal
-        187±10ms          166±6ms     0.89  frame_methods.ToDict.time_to_dict_datetimelike('split')
-        29.9±1ms       26.6±0.2ms     0.89  frame_methods.Equals.time_frame_nonunique_equal
-        43.6±2ms         38.8±1ms     0.89  frame_methods.Isnull.time_isnull_strngs
-        39.9±2ms       35.3±0.3ms     0.89  frame_methods.Equals.time_frame_object_unequal
-        201±20ms          177±7ms     0.88  frame_methods.ToDict.time_to_dict_datetimelike('records')
-        39.6±1ms         34.5±1ms     0.87  frame_methods.Isnull.time_isnull_obj
-      5.56±0.4ms       4.81±0.3ms     0.87  frame_methods.Fillna.time_frame_fillna(False, 'bfill', 'float64')
-      5.60±0.3μs       4.84±0.2μs     0.86  frame_methods.ToDict.time_to_dict_datetimelike('series')
-        32.7±2μs       28.3±0.8μs     0.86  frame_methods.GetNumericData.time_frame_get_numeric_data
-      12.2±0.5ms       10.5±0.3ms     0.86  frame_methods.Fillna.time_frame_fillna(False, 'bfill', 'datetime64[ns, tz]')
-         238±8ms          203±6ms     0.86  frame_methods.GetDtypeCounts.time_info
-        75.4±3ms         64.6±2ms     0.86  frame_methods.Rename.time_rename_single
-     1.00±0.04ms         853±20μs     0.85  frame_methods.GetDtypeCounts.time_frame_get_dtype_counts
-      10.2±0.6ms      8.67±0.08ms     0.85  frame_methods.Fillna.time_frame_fillna(True, 'bfill', 'datetime64[ns, tz]')
-      10.1±0.6ms       8.54±0.5ms     0.85  frame_methods.Shift.time_shift(0)
-      11.7±0.9ms       9.86±0.3ms     0.85  frame_methods.Fillna.time_frame_fillna(False, 'pad', 'datetime64[ns, tz]')
-         173±7ms          146±8ms     0.84  frame_methods.ToDict.time_to_dict_datetimelike('list')
-        74.7±3ms         63.0±1ms     0.84  frame_methods.Rename.time_rename_axis1
-        69.1±3ms         58.2±4ms     0.84  frame_methods.ToHTML.time_to_html_mixed
-      3.88±0.4ms       3.27±0.1ms     0.84  frame_methods.Fillna.time_frame_fillna(False, 'bfill', 'float32')
-        126±20ms          106±4ms     0.84  frame_methods.Reindex.time_reindex_axis1_missing
-      2.83±0.2ms      2.38±0.09ms     0.84  frame_methods.Fillna.time_frame_fillna(False, 'bfill', 'datetime64[ns]')
-      12.4±0.4ms       10.4±0.3ms     0.84  frame_methods.Fillna.time_frame_fillna(False, 'bfill', 'Int64')
-        35.1±3ms         29.4±2ms     0.84  frame_methods.SortIndexByColumns.time_frame_sort_values_by_columns
-        77.6±2ms         64.9±2ms     0.84  frame_methods.Repr.time_frame_repr_wide
-      5.37±0.7ms       4.48±0.2ms     0.83  frame_methods.Fillna.time_frame_fillna(True, 'bfill', 'float64')
-      2.98±0.2ms       2.47±0.3ms     0.83  frame_methods.Fillna.time_frame_fillna(False, 'pad', 'timedelta64[ns]')
-        54.3±3ms         45.0±1ms     0.83  frame_methods.Equals.time_frame_object_equal
-      10.4±0.7ms       8.63±0.2ms     0.83  frame_methods.Fillna.time_frame_fillna(True, 'bfill', 'Float64')
-        451±50μs         373±20μs     0.83  frame_methods.Isnull.time_isnull
-      5.18±0.4ms       4.29±0.2ms     0.83  frame_methods.Fillna.time_frame_fillna(True, 'pad', 'float64')
-      11.3±0.8ms       9.37±0.3ms     0.83  frame_methods.Fillna.time_frame_fillna(True, 'pad', 'Float64')
-        83.1±3ms         68.5±2ms     0.82  frame_methods.Rename.time_dict_rename_both_axes
-      11.6±0.7ms       9.50±0.1ms     0.82  frame_methods.Fillna.time_frame_fillna(True, 'pad', 'Int64')
-      4.46±0.5ms       3.65±0.2ms     0.82  frame_methods.Fillna.time_frame_fillna(True, 'pad', 'float32')
-        12.3±1ms       10.0±0.5ms     0.82  frame_methods.MemoryUsage.time_memory_usage_object_dtype
-        12.2±1ms       9.95±0.3ms     0.82  frame_methods.Fillna.time_frame_fillna(True, 'pad', 'datetime64[ns, tz]')
-        30.6±2ms         24.7±1ms     0.81  frame_methods.Reindex.time_reindex_axis1
-        534±40μs         429±30μs     0.80  frame_methods.Shift.time_shift(1)
-      7.30±0.2ms       5.86±0.2ms     0.80  frame_methods.Repr.time_html_repr_trunc_mi
-     4.81±0.07ms       3.86±0.1ms     0.80  frame_methods.Repr.time_html_repr_trunc_si
-        186±20ms          148±4ms     0.79  frame_methods.Fillna.time_frame_fillna(False, 'bfill', 'object')
-        289±60ms          229±8ms     0.79  frame_methods.Fillna.time_frame_fillna(True, 'bfill', 'object')
-      11.9±0.7ms       9.38±0.2ms     0.79  frame_methods.Fillna.time_frame_fillna(False, 'pad', 'Int64')
-        12.2±1ms       9.63±0.8ms     0.79  frame_methods.Rank.time_rank('float')
-      11.6±0.8ms       9.16±0.2ms     0.79  frame_methods.Fillna.time_frame_fillna(False, 'pad', 'Float64')
-        292±50ms         227±10ms     0.78  frame_methods.Fillna.time_frame_fillna(True, 'pad', 'object')
-      3.95±0.2ms       2.99±0.1ms     0.76  frame_methods.Rank.time_rank('uint')
-      4.06±0.4ms      3.07±0.09ms     0.75  frame_methods.Fillna.time_frame_fillna(False, 'pad', 'float32')
-      4.63±0.7ms       3.47±0.4ms     0.75  frame_methods.Lookup.time_frame_fancy_lookup
-      1.82±0.2ms      1.35±0.09ms     0.74  frame_methods.Fillna.time_frame_fillna(True, 'bfill', 'datetime64[ns]')
-      4.88±0.4ms       3.62±0.3ms     0.74  frame_methods.Fillna.time_frame_fillna(True, 'bfill', 'float32')
-      5.87±0.5ms       4.33±0.2ms     0.74  frame_methods.Fillna.time_frame_fillna(False, 'pad', 'float64')
-        193±30ms          140±7ms     0.73  frame_methods.Fillna.time_frame_fillna(False, 'pad', 'object')
-       972±100ms         700±30ms     0.72  frame_methods.Nunique.time_frame_nunique
-      4.45±0.3ms       3.17±0.2ms     0.71  frame_methods.NSort.time_nsmallest_two_columns('last')
-      5.17±0.4ms       3.66±0.3ms     0.71  frame_methods.NSort.time_nlargest_two_columns('last')
-        5.14±1ms       3.58±0.2ms     0.70  frame_methods.NSort.time_nlargest_two_columns('all')
-      1.95±0.3ms      1.15±0.06ms     0.59  frame_methods.Fillna.time_frame_fillna(True, 'bfill', 'timedelta64[ns]')

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Dec 7, 2021
@jreback jreback added this to the 1.4 milestone Dec 7, 2021
@jreback
Copy link
Contributor

jreback commented Dec 7, 2021

pandas/core/internals/managers.py:1090: error: Argument 1 to "_iset_single" of "BlockManager" has incompatible type "Union[int, slice, ndarray[Any, Any]]"; expected "int" [arg-type]

looks good otherwise

@jbrockmendel
Copy link
Member Author

still looking at asvs; the full run showed some slowdowns that don't make sense to me so i need to do some digging

@jbrockmendel
Copy link
Member Author

OK, the slower-looking asvs appear to be false-positives. Good to go here i think.

@jreback
Copy link
Contributor

jreback commented Dec 9, 2021

hmm this has a lot of test failures, let's wait to rebase till after CI patch

@jreback jreback merged commit 351b688 into pandas-dev:master Dec 11, 2021
@jbrockmendel jbrockmendel deleted the perf-isetitem branch December 11, 2021 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants