Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculations finish locally but OOM when run in Docker #18403

Open
2 tasks done
sr379-xyt opened this issue Aug 27, 2024 · 3 comments
Open
2 tasks done

Calculations finish locally but OOM when run in Docker #18403

sr379-xyt opened this issue Aug 27, 2024 · 3 comments
Labels
invalid A bug report that is not actually a bug python Related to Python Polars

Comments

@sr379-xyt
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
import pandas as pd

data = pl.datetime_range(
    start=pl.lit("2024-08-19T08:00:00", dtype=pl.Datetime(time_unit="ns")),
    end=pl.lit("2024-08-19T16:00:00", dtype=pl.Datetime(time_unit="ns")),
    interval="100us",
    eager=True,
)

df = pl.DataFrame(data, schema={"datetime": pl.Datetime(time_unit="ns")})


def create_calculation_plan(
    df: pl.DataFrame, end_dt: pd.Timestamp, offset_s: int
) -> pl.LazyFrame:
    plan = (
        df.lazy()
        .filter(
            pl.col("datetime")
            <= pl.lit(
                end_dt - pd.Timedelta(seconds=offset_s),
                dtype=pl.Datetime(time_unit="ns"),
            )
        )
        .unique()
        .last()
    )

    return plan


calculation_plans = [
    create_calculation_plan(df, pd.Timestamp("2024-08-19T16:00:00"), offset)
    for offset in range(3)
]

pl.collect_all(calculation_plans)

Log output

No response

Issue description

The code is finishing successfully when run locally on Windows with approx. 8GB memory available. It saturates memory (100% RAM utillisation) during the calculations. It finishes successfully even when run in two separate processes simultaneously.
However when run locally using Docker the container is instantly OOM killed (memory limits not set, all available memory could be used). Also similar code was run succesfully on local Windows and docker container on kubernetes with more RAM available (12GB). The container on kubernetes was also OOM killed.
I think that the problem could be connected with cgroups limits. However issue regarding those was completed #15797

Expected behavior

The code finishes successfuly when executed in Docker container.

Installed versions

--------Version info---------
Polars:               1.5.0
Index type:           UInt32
Platform:             Windows-11-10.0.22631-SP0
Python:               3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:03:56) [MSC v.1929 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                2.1.0
openpyxl:             <not installed>
pandas:               2.2.2
pyarrow:              <not installed>
pydantic:             <not installed>
pyiceberg:            <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@sr379-xyt sr379-xyt added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Aug 27, 2024
@ritchie46
Copy link
Member

Maybe you are swapping locally and you cannot swap in docker?

In any case, going out of memory is not a bug we can fix. You need to get more memory for the job.

@ritchie46 ritchie46 added invalid A bug report that is not actually a bug and removed bug Something isn't working needs triage Awaiting prioritization by a maintainer labels Aug 28, 2024
@sr379-xyt
Copy link
Author

Thanks for response. Is there any way polars can throw exception while having not enough memory or OOM is the only possible outcome?

@ritchie46
Copy link
Member

No, Restoring from OOM is not really feasible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid A bug report that is not actually a bug python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

3 participants
@ritchie46 @sr379-xyt and others