Memory consumption of fitting EBMs

I'm running out of memory trying to fit a `ExplainableBoostingClassifier`.

The order of magnitude of the features is 10, of the samples is $10^5$, and of the classes is 100. In the recent release, the feature [ExplainableBoostingClassifier.estimate_mem](https://interpret.ml/docs/python/api/ExplainableBoostingClassifier.html#interpret.glassbox.ExplainableBoostingClassifier.estimate_mem) added. The docstring is somewhat vague:

> Estimate memory usage of the model.

Is this the memory necessary to store the model (unlikely, as `X` and `y` are arguments), the memory necessary to fit the model (most likely, as `y` is an argument), or the memory necessary to make predictions (unlikely, predictions aren't memory intensive).

However, `ExplainableBoostingClassifier.extimate_mem` indicates tiny memory usage while in reality I run out of memory (more than 100GiB needed). The functions indicate memory usage independent on the number of classes, while for 'small' numbers of 1 to 100 classes, I observe the memory consumption to be roughly an affine function in the number of classes. For larger number of classes it seems to 'saturate', I guess the system starts caching to disk.

So far, I used `resource.getrusage(resource.RUSAGE_SELF).ru_maxrss` to measure the memory consumption, so I am not really experienced in measuring memory. For once, I am not sure if this includes the memory of the parallel processes used by default for each bags.

But for both variants

```python
ebm = ExplainableBoostingClassifier(interactions=0)
with joblib.parallel_config(backend="loki"):
    ebm.fit(X, y)
```

```python
ebm = ExplainableBoostingClassifier(interactions=0)
with joblib.parallel_config(backend="threading"):
    ebm.fit(X, y)
```

I observed a rather linear increase of memory consumption with classes.
To me this seems not surprising, as for each class a shape functions is fit.

The question is threefold: Is `estimate_mem` correct? What is the required memory to fit a classifier? What can I do to fit the model with limited memory (largest machine I can get has around 512 GiB)?

Probably it's best to use threading instead of the default of multiprocessing, or are you making use of smart shared memory or memory mapping data from disk?

----------------

Example script for minimal tests:

```python
import resource

import joblib
import numpy as np
from interpret.glassbox import ExplainableBoostingClassifier

backend = "loky"
n_samples = 10_000
n_features = 10
n_classes = 100
rng = np.random.default_rng(42)
X = rng.random([n_samples, n_features], dtype=np.float32)
y = rng.integers(n_classes, size=[n_samples], dtype=np.int32)
ebm = ExplainableBoostingClassifier(interactions=0, random_state=42, max_rounds=3, n_jobs=16)  # small max rounds for fast testing
print(f"Pre: {resource.getrusage(resource.RUSAGE_SELF).ru_maxrss * 1024**-2:.3f} GiB")
with joblib.parallel_config(backend=backend):
    ebm.fit(X, y)
print(f"Post: {resource.getrusage(resource.RUSAGE_SELF).ru_maxrss * 1024**-2:.3f} GiB")
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory consumption of fitting EBMs #630

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory consumption of fitting EBMs #630

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions