Description
I write down the current memory usage as a memo just in case when we encounter memory leak issues in the future.
This post is based on the current implementation.
When we run a dataset with the size of 300B, AutoPytorch consumes ~1.5GB and the followings are the major source of the memory consumptions:
Source | Consumption [GB] |
---|---|
Import modules | 0.35 |
Dask Client | 0.35 |
Logger (Thread safe) | 0.4 |
Running of context.Process in multiprocessing module | 0.4 |
Model | 0 ~ inf |
Total | 1.5 ~ inf |
When we run a dataset with the size of 300MB (400,000 instances x 80 features) such as Albert, AutoPytorch consumes ~2.5GB and the followings are the major source of the memory consumptions:
Source | Consumption [GB] |
---|---|
Import modules | 0.35 |
Dask Client | 0.35 |
Logger (Thread safe) | 0.4 |
Dataset itself | 0.3 |
self.categories in InputValidator | 0.3 |
Running of context.Process in multiprocessing module | 0.4 |
Model (e.g. LightGBM) | 0.4 ~ inf |
Total | 2.5 ~ inf |
All the information was obtained by:
$ mprof run --include-children python -m examples.tabular.20_basics.example_tabular_classification
and the logger which I set for the debugging. Note that I also added time.sleep(0.5)
before and after the line of interest to eliminate the possibilities of the influences from other elements and checked each line in detail.