Skip to content

Memory consumption of fitting EBMs #630

@DerWeh

Description

@DerWeh

I'm running out of memory trying to fit a ExplainableBoostingClassifier.

The order of magnitude of the features is 10, of the samples is $10^5$, and of the classes is 100. In the recent release, the feature ExplainableBoostingClassifier.estimate_mem added. The docstring is somewhat vague:

Estimate memory usage of the model.

Is this the memory necessary to store the model (unlikely, as X and y are arguments), the memory necessary to fit the model (most likely, as y is an argument), or the memory necessary to make predictions (unlikely, predictions aren't memory intensive).

However, ExplainableBoostingClassifier.extimate_mem indicates tiny memory usage while in reality I run out of memory (more than 100GiB needed). The functions indicate memory usage independent on the number of classes, while for 'small' numbers of 1 to 100 classes, I observe the memory consumption to be roughly an affine function in the number of classes. For larger number of classes it seems to 'saturate', I guess the system starts caching to disk.

So far, I used resource.getrusage(resource.RUSAGE_SELF).ru_maxrss to measure the memory consumption, so I am not really experienced in measuring memory. For once, I am not sure if this includes the memory of the parallel processes used by default for each bags.

But for both variants

ebm = ExplainableBoostingClassifier(interactions=0)
with joblib.parallel_config(backend="loki"):
    ebm.fit(X, y)
ebm = ExplainableBoostingClassifier(interactions=0)
with joblib.parallel_config(backend="threading"):
    ebm.fit(X, y)

I observed a rather linear increase of memory consumption with classes.
To me this seems not surprising, as for each class a shape functions is fit.

The question is threefold: Is estimate_mem correct? What is the required memory to fit a classifier? What can I do to fit the model with limited memory (largest machine I can get has around 512 GiB)?

Probably it's best to use threading instead of the default of multiprocessing, or are you making use of smart shared memory or memory mapping data from disk?


Example script for minimal tests:

import resource

import joblib
import numpy as np
from interpret.glassbox import ExplainableBoostingClassifier

backend = "loky"
n_samples = 10_000
n_features = 10
n_classes = 100
rng = np.random.default_rng(42)
X = rng.random([n_samples, n_features], dtype=np.float32)
y = rng.integers(n_classes, size=[n_samples], dtype=np.int32)
ebm = ExplainableBoostingClassifier(interactions=0, random_state=42, max_rounds=3, n_jobs=16)  # small max rounds for fast testing
print(f"Pre: {resource.getrusage(resource.RUSAGE_SELF).ru_maxrss * 1024**-2:.3f} GiB")
with joblib.parallel_config(backend=backend):
    ebm.fit(X, y)
print(f"Post: {resource.getrusage(resource.RUSAGE_SELF).ru_maxrss * 1024**-2:.3f} GiB")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions