Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The first call to pyfunction returning PyDataFrame is slow #72

Open
Androidown opened this issue Mar 14, 2024 · 3 comments
Open

The first call to pyfunction returning PyDataFrame is slow #72

Androidown opened this issue Mar 14, 2024 · 3 comments

Comments

@Androidown
Copy link

I found that the very first call to a pyfunction which returns PyDataFrame has a 100ms lag.

Here is a minimal reproducible example:

extension

use pyo3_polars::PyDataFrame;
use pyo3::prelude::*;

#[pyfunction]
fn dup(pydf: PyDataFrame) -> PyDataFrame {
    let df = pydf.0;
    let new_df = df.vstack(&df.clone()).unwrap();
    PyDataFrame(new_df)
    // Python::with_gil(|py| into_py(&PyDataFrame(new_df), py))
}

#[pymodule]
#[pyo3(name = "test")]
fn py(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(dup, m)?)?;
    Ok(())
}

ipython

In [1]: import test
   ...: import polars as pl
   ...: df = pl.DataFrame({'a': [1], 'b': [2]})

In [2]:

In [2]: %time py.dup(df)
Wall time: 110 ms
Out[2]:
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 2   │
│ 1   ┆ 2   │
└─────┴─────┘

In [3]: %time test.dup(df)
Wall time: 0 ns
Out[3]:
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 2   │
│ 1   ┆ 2   │
└─────┴─────┘

Is there anything I can do to eliminate this delay?

@sdd
Copy link

sdd commented Jul 22, 2024

I see the same issue. It happens even when trivially returning an empty dataframe. I've created a minimally reproducible example here: https://github.com/sdd/py03-bug

❯ python main.py
shape: (0, 0)
┌┐
╞╡
└┘
time taken for first: 42.98 ms
shape: (0, 0)
┌┐
╞╡
└┘
time taken for second: 0.02 ms

@sdd
Copy link

sdd commented Jul 22, 2024

I have a more complex module in another use case and the delay in that module is 600ms when first returning a dataframe, even if it is empty.

@sdd
Copy link

sdd commented Jul 22, 2024

Update: I think I've solved my own problem.

The python script did not contain import polars, so pyo3-polars was having to do that first.

Once my script had import polars at the top, the first call was just as fast as any other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants