Skip to content

[memo] High memory consumption and the places of doubts #180

Open
@nabenabe0928

Description

@nabenabe0928

I write down the current memory usage as a memo just in case when we encounter memory leak issues in the future.
This post is based on the current implementation.

When we run a dataset with the size of 300B, AutoPytorch consumes ~1.5GB and the followings are the major source of the memory consumptions:

Source Consumption [GB]
Import modules 0.35
Dask Client 0.35
Logger (Thread safe) 0.4
Running of context.Process in multiprocessing module 0.4
Model 0 ~ inf
Total 1.5 ~ inf

When we run a dataset with the size of 300MB (400,000 instances x 80 features) such as Albert, AutoPytorch consumes ~2.5GB and the followings are the major source of the memory consumptions:

Source Consumption [GB]
Import modules 0.35
Dask Client 0.35
Logger (Thread safe) 0.4
Dataset itself 0.3
self.categories in InputValidator 0.3
Running of context.Process in multiprocessing module 0.4
Model (e.g. LightGBM) 0.4 ~ inf
Total 2.5 ~ inf

All the information was obtained by:

$ mprof run --include-children python -m examples.tabular.20_basics.example_tabular_classification

and the logger which I set for the debugging. Note that I also added time.sleep(0.5) before and after the line of interest to eliminate the possibilities of the influences from other elements and checked each line in detail.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions