PERF: Remove _item_cache

Discussion copied over from #49450

In OP of #49450(discusses turning on the _item_cache for CoW),

Context: 

> Currently, we use an item cache for DataFrame columns -> Series. Whenever we access a certain column, we cache the resulting Series in `df._item_cache`, and the next time we access a column, we first check if that column already exists in the cache and if so return that directly. I suppose this was done for making repeated column access faster (although the Series construction overhead for this fast path case also has improved I think). But is also has some behavioral consequences, i.e. Series objects from column access can be _identical_ objects, depending on the context:
> 
> ```python
> >>> df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
> >>> s1 = df["a"]
> >>> s2 = df["a"]
> >>> df['b'] = 10 # set existing column -> clears the item cache
> >>> s3 = df["a"]
> >>> s1 is s2
> True
> >>> s1 is s3
> False
> ```
> 

This caching can also have other side effects, though. In investigating #29411, I found that methods like ``memory_usage``(also looks like ``round``, ``duplicated``, may be affected from a quick glance at frame.py) that iterate through all the columns by calling ``.items()``, will actually cause all the columns to be cached in _item_cache, which blows up memory usage. 

This might be tricky to do, though, as Joris noted, since this would be a behavior change. 
We should discuss here how we want to go about doing this(needs deprecation?).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: Remove _item_cache #50547

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

PERF: Remove _item_cache #50547

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions