Skip to content

PERF: avoid extra copies in ._where #51547

Open
@jbrockmendel

Description

@jbrockmendel

In NDFrame._where we do

         # align the cond to same shape as myself
         cond = common.apply_if_callable(cond, self)
         if isinstance(cond, NDFrame):
            # CoW: Make sure reference is not kept alive
            cond = cond.align(self, join="right", broadcast_axis=1, copy=False)[0]

        [...]
         # make sure we are boolean
        fill_value = bool(inplace)
        cond = cond.fillna(fill_value)

        [...]
        cond = -cond if inplace else cond
        cond = cond.reindex(self._info_axis, axis=self._info_axis_number, copy=False)

I think we can avoid making some copies of cond by

a) if we do an align, pass fill_value=bool(inplace) there to avoid having to fill it later.
b) check if we need to do fillna before doing it (maybe that check itself is expensive? may be irrelevant with CoW?)
c) maybe avoid doing the cond = -cond if ... by handling that at a lower level?

These are all pretty speculative, salvaged from an old branch that was collecting dust. Might be a good 4th issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselvesPerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions