Skip to content

Conversation

@JonAnCla
Copy link
Contributor

Description of changes

Table.select() now includes a "private" kwarg _exprs_that_do_not_need_dereferencing which is an optional dict of values that do not need "dereferencing" via Table.bind - this should be used for values that are already known to be "dereferenced" to reduce cost of that step

_exprs_that_do_not_need_dereferencing is now passed by Table.mutate and contains columns on self, so that if a wide table is mutated, existing columns are not dereferenced, and only new ones are instead.

With this change, time needed to add a column to a wide table is reduced by about 50% for a 100 column table.

100 mutates on 100 column table before:

         14967009 function calls (14945093 primitive calls) in 7.366 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.028    0.028    7.366    7.366 2926429498.py:4(build_expr)
      200    0.014    0.000    7.129    0.036 relations.py:1866(mutate)
      200    0.025    0.000    5.164    0.026 relations.py:1961(select)
      400    0.050    0.000    4.276    0.011 relations.py:270(bind)
218900/216800    0.174    0.000    3.532    0.000 bases.py:52(__call__)
65600/65000    0.116    0.000    2.781    0.000 grounds.py:116(__create__)
      400    0.173    0.000    2.217    0.006 rewrites.py:63(from_targets)
    42700    0.194    0.000    2.079    0.000 graph.py:479(replace)
      600    0.042    0.000    1.908    0.003 relations.py:69(fields)
    21000    0.081    0.000    1.884    0.000 rewrites.py:248(rewrite_project_input)
65600/65300    0.201    0.000    1.815    0.000 annotations.py:459(validate)
    21700    0.035    0.000    1.653    0.000 rewrites.py:135(dereference)
    64800    0.062    0.000    1.219    0.000 graph.py:552(from_bfs)
    64800    0.316    0.000    1.098    0.000 graph.py:752(bfs_while)
67400/66900    0.330    0.000    0.842    0.000 grounds.py:193(__init__)
235100/234800    0.099    0.000    0.738    0.000 patterns.py:798(match)
    61600    0.066    0.000    0.689    0.000 relations.py:93(__init__)
     1500    0.067    0.000    0.648    0.000 patterns.py:1242(match)
2884549/2884545    0.584    0.000    0.638    0.000 {built-in method builtins.isinstance}
    42700    0.231    0.000    0.564    0.000 graph.py:642(toposort)

100 mutates on 100 column table after

         8570532 function calls (8548622 primitive calls) in 3.505 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      2/1    0.000    0.000    3.498    3.498 {built-in method builtins.exec}
      2/1    0.000    0.000    3.498    3.498 <string>:1(<module>)
      2/1    0.015    0.008    3.497    3.497 2926429498.py:4(build_expr)
      200    0.009    0.000    3.339    0.017 relations.py:1903(mutate)
76800/74700    0.062    0.000    2.606    0.000 bases.py:52(__call__)
65600/65000    0.105    0.000    2.501    0.000 grounds.py:116(__create__)
      400    0.008    0.000    2.381    0.006 relations.py:282(bind)
      400    0.155    0.000    1.935    0.005 rewrites.py:63(from_targets)
      600    0.038    0.000    1.735    0.003 relations.py:69(fields)
65600/65300    0.184    0.000    1.637    0.000 annotations.py:459(validate)
      200    0.010    0.000    1.537    0.008 relations.py:1999(select)
67400/66900    0.294    0.000    0.748    0.000 grounds.py:195(__init__)
235100/234800    0.090    0.000    0.658    0.000 patterns.py:801(match)
    61600    0.061    0.000    0.623    0.000 relations.py:93(__init__)
     1500    0.060    0.000    0.578    0.000 patterns.py:1245(match)
65601/65600    0.027    0.000    0.486    0.000 inspect.py:3268(bind)
65601/65600    0.300    0.000    0.459    0.000 inspect.py:3129(_bind)
   149300    0.173    0.000    0.375    0.000 core.py:44(__coerce__)
135700/134100    0.088    0.000    0.371    0.000 annotations.py:164(get_default)
     1400    0.004    0.000    0.304    0.000 rewrites.py:135(dereference)

Issues closed

@jc-5s
Copy link
Contributor

jc-5s commented Jul 14, 2025

sorry for clarity both accounts listed as committing here (hottwa, jc-5s) are me

values = {**node.fields, **values}
return self.select(**values)
# kept fields from node.fields can skip dereferencing in .select() to improve performance
return self.select(_exprs_that_do_not_need_dereferencing=node.fields, **values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of passing a special reserved keyword argument we should rather duplicate the implementation of .select() which is actually fairly short. That way we have clear control over about what to dereference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

    def mutate(self, *exprs, **mutations):
        from ibis.expr.rewrites import rewrite_project_input

        node = self.op()
        values = self.bind(*exprs, **mutations)
        values = unwrap_aliases(values)

        # we need to detect reductions which are either turned into window functions
        # or scalar subqueries depending on whether they are originating from self
        values = {
            k: rewrite_project_input(v, relation=self.op()) for k, v in values.items()
        }
        return ops.Project(self, {**node.fields, **values}).to_expr()

@jc-5s
Copy link
Contributor

jc-5s commented Jul 15, 2025

Thanks for reviewing so quickly & the comments which I've implemented. They have a bonus side effect which is that an additional call to .bind() within the removed call to .select(), which appears to be superfluous, is now omitted.

So runtime is now about 30% of original (see below for profile).

Around 50% of remaining time is spent in DerefMap.from_targets. @kszucs you suggested caching this on Tables here, which I would be in favour of if acceptable to ibis team? This would be beneficial to all table operations as well as .mutate

A typical use case we have is to run many relatively simple queries on a base table, so having the derefmap cached would be quite beneficial from our POV.

         5594429 function calls (5577914 primitive calls) in 2.248 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      2/1    0.000    0.000    2.242    2.242 {built-in method builtins.exec}
        1    0.000    0.000    2.242    2.242 <string>:1(<module>)
        1    0.013    0.013    2.240    2.240 2638618455.py:4(build_expr)
      200    0.010    0.000    2.121    0.011 relations.py:1903(mutate)
52700/51100    0.038    0.000    1.748    0.000 bases.py:52(__call__)
44300/43700    0.069    0.000    1.673    0.000 grounds.py:116(__create__)
      200    0.004    0.000    1.233    0.006 relations.py:282(bind)
      400    0.024    0.000    1.133    0.003 relations.py:69(fields)
44300/44000    0.122    0.000    1.072    0.000 annotations.py:459(validate)
      200    0.074    0.000    0.980    0.005 rewrites.py:63(from_targets)
45200/44800    0.196    0.000    0.518    0.000 grounds.py:195(__init__)
    41300    0.041    0.000    0.416    0.000 relations.py:93(__init__)
150600/150300    0.057    0.000    0.410    0.000 patterns.py:801(match)
     1000    0.039    0.000    0.365    0.000 patterns.py:1245(match)
44301/44300    0.018    0.000    0.321    0.000 inspect.py:3268(bind)
44301/44300    0.197    0.000    0.302    0.000 inspect.py:3129(_bind)
90700/89300    0.060    0.000    0.268    0.000 annotations.py:164(get_default)
    85900    0.100    0.000    0.224    0.000 core.py:44(__coerce__)
     1400    0.014    0.000    0.178    0.000 graph.py:479(replace)
      700    0.002    0.000    0.154    0.000 rewrites.py:135(dereference)

@jc-5s
Copy link
Contributor

jc-5s commented Jul 15, 2025

if Derefmaps are cached, then their memory requirements for a length D chain of operations with N fields is O(N*D^2), and each new derefmap on the chain takes incrementally more time to build than the previous one (that's already the case though), so that's probably not a great way forward

Another approach would be to only cache 1 layer depth of derefs on each table, and then do something like:

while expr.table != self:  # assumes expressions have some concept of the table they are bound to
    expr = self._deref_maps_by_src_table[expr.table][expr]

this would have following characteristics:

  • derefmaps would be quicker to build, only 1 layer depth DFS is needed for each table
  • they would be cached per table and cost N fields per table, so memory requirements would be better, O(N*D)
  • expressions from "closer" tables would be faster to dereference. expressions from tables N steps away would require N lookups
    • but optimistically, hopefully users normally use "close" relations when writing queries rather than distant ones - though I have no evidence :)

would that be worth considering?

│ Adelie │ 2007 │ -7.22193 │
└─────────┴───────┴────────────────┘
"""
from ibis.expr.rewrites import rewrite_project_input
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment here to keep the implementation of .mutate() and .select() in sync except for the **node.fields part? This might help preserving feature parity between the two methods.

@kszucs
Copy link
Member

kszucs commented Jul 15, 2025

@jc-5s yes, we should find a better way to make dereferencing less resource hungry. Your described solution sounds good on principle but there are a lot of details we need to pay attention to. Feel free to put up a follow-up PR then I will take a look.

value = value.name(original.get_name())
result.append(value)
return tuple(result)
return tuple(result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra trailing whitespace.

Copy link
Member

@kszucs kszucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my side.

@cpcloud please merge when you have the chance.

@jc-5s
Copy link
Contributor

jc-5s commented Jul 15, 2025

thanks again, I've made the linting fixes and added comments to select and mutate about keeping them in sync

will come back to the DerefMap idea at another point as time allows :)

to fix the "conventional commits" issue, is it best to e.g. squash the PR and follow style e.g. here?
https://gist.github.com/qoomon/5dfcdf8eec66a051ecd85625518cfd13

Just LMK if anything else needed. Thanks!

@cpcloud cpcloud changed the title Perf: skip dereferencing cols preserved in .mutate perf(mutate): skip dereferencing cols preserved in .mutate Jul 16, 2025
@cpcloud cpcloud merged commit 70de1a1 into ibis-project:main Jul 16, 2025
109 of 112 checks passed
@cpcloud cpcloud added the performance Issues related to ibis's performance label Jul 16, 2025
@cpcloud
Copy link
Member

cpcloud commented Jul 16, 2025

Thanks @hottwaj and @kszucs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Issues related to ibis's performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: .mutate() on wide tables is slow

4 participants