Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of register_plugin_function #18857

Closed
jtanx opened this issue Sep 23, 2024 · 7 comments · Fixed by #18860
Closed

Performance of register_plugin_function #18857

jtanx opened this issue Sep 23, 2024 · 7 comments · Fixed by #18860
Labels
enhancement New feature or an improvement of an existing feature

Comments

@jtanx
Copy link

jtanx commented Sep 23, 2024

Description

I've been testing out using register_plugin_function to implement a custom expression plugin, which works quite well, but I've noticed in profiling that when this expression is called, it can get fairly expensive to re-resolve the path to the DLL on every invocation: https://github.com/pola-rs/polars/blame/262f7bc63c89f9692b7e96215cf599835adaac92/py-polars/polars/plugins.py#L124

I'm not super familiar with the mechanics of how this plugin registration works, but is it possible to implement some sort of caching around this, or even of the registration vs. instantiation of the expr itself?

Thanks!

@jtanx jtanx added the enhancement New feature or an improvement of an existing feature label Sep 23, 2024
@MarcoGorelli
Copy link
Collaborator

hi @jtanx - could you show how you're calling this function please? if you pre-compute the plugin path to be the exact plugin path (or at least, the parent directory) then is it still expensive?

@jtanx
Copy link
Author

jtanx commented Sep 23, 2024

@MarcoGorelli
Copy link
Collaborator

thanks! ok maybe a simple-enough fix is to just lru_cache _resolve_plugin_path

@jtanx
Copy link
Author

jtanx commented Sep 23, 2024

Sorry, I think I got the permalink wrong, but it's still in the resolve call https://github.com/pola-rs/polars/blame/262f7bc63c89f9692b7e96215cf599835adaac92/py-polars/polars/plugins.py#L120

@jtanx
Copy link
Author

jtanx commented Sep 23, 2024

Basic flamegraph:

image

@MarcoGorelli
Copy link
Collaborator

😱 Pathlib.resolve takes 1.47 seconds for you?

I was never a fan of Pathlib, but I wasn't expecting this

Could you provide some more details please? Platform, Python version?

@jtanx
Copy link
Author

jtanx commented Sep 23, 2024

Ah, to be clear, that's cumulative time if I call my expr function 10000 times (so not the time for one call), it's basically just something like

import my_plugin as plu
import polars as pl

for i in range(10000):
    plu.transform(pl.col("A"))

then

python -m cProfile -o a.dat test.py
snakeviz a.dat -s

This is with Python 3.12 on Linux, Polars 1.7.1. I'd read it more in terms of the proportion of time spent (i.e. more than 80% of time is spent in path resolution)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants