Make use of data more by devising subsampling #426

nabenabe0928 · 2022-04-01T12:58:16Z

When we use a certain memory_allocation ¹ in subsampling, we reduce the number of samples until we reach the memory limit.
However, we need to come up with an appropriate value for this as when we set it too high, the training fails due to memory error while when we set it too low, we waste memory.

For now, we circumvent this issue by measuring the memory consumption when using the default config.

The definition of the memory_allocation is the following:
Absolute memory in MB, e.g. 10MB is "memory_allocation": 10.
The memory used by the dataset is checked after each reduction method is performed.
If the dataset fits into the allocated memory, any further methods listed in "methods" will not be performed. ↩

The text was updated successfully, but these errors were encountered:

nabenabe0928 · 2022-04-01T19:24:11Z

The information is based on the runs on dev branch.
Those passed when I ran with example for the tabular classification.

mem_limit, N, D = 3000, 500, 10000  # 3GB
mem_limit, N, D = 4000, 2500, 10000  # 4GB
mem_limit, N, D = 5000, 4500, 10000  # 5GB
mem_limit, N, D = 6000, 6500, 10000  # 6GB
mem_limit, N, D = 7000, 8500, 10000  # 7GB
mem_limit, N, D = 8000, 11500, 10000  # 8GB
mem_limit, N, D = 9000, 15000, 10000  # 9GB

Note that since neural networks become larger when using a larger input size, I used a large fixed D and this D is determined via the feature size in automlbenchmark.

Since when we use memory_limit=2000, APT does not run, we use the following equation to calculate the memory_allocation and raise an error when we get memory_allocation < 0:

memory_allocation = (memory_limit - 3000) / 1000.0 * 160 + 40

nabenabe0928 · 2022-04-05T15:40:52Z

The information is based on the runs on common modification branch.

mem_limit, N, D = 3000, 500, 10000  # 3GB
mem_limit, N, D = 4000, 3000, 10000  # 4GB
mem_limit, N, D = 5000, 5500, 10000  # 5GB
mem_limit, N, D = 6000, 8500, 10000  # 6GB
mem_limit, N, D = 7000, 11500, 10000  # 7GB
mem_limit, N, D = 8000, 15500, 10000  # 8GB

The training ratio was 0.75, so we might need to take the 75% of those values.
Then 200MB increments will be 150MB increments and 80MB for the 3GB will 30MB.

memory_allocation = (memory_limit - 3000) / 1000.0 * 150 + 30

nabenabe0928 mentioned this issue Apr 4, 2022

[feat] Add an adaptive subsampler #427

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make use of data more by devising subsampling #426

Make use of data more by devising subsampling #426

nabenabe0928 commented Apr 1, 2022

nabenabe0928 commented Apr 1, 2022 •

edited

Loading

nabenabe0928 commented Apr 5, 2022 •

edited

Loading

Make use of data more by devising subsampling #426

Make use of data more by devising subsampling #426

Comments

nabenabe0928 commented Apr 1, 2022

Footnotes

nabenabe0928 commented Apr 1, 2022 • edited Loading

nabenabe0928 commented Apr 5, 2022 • edited Loading

nabenabe0928 commented Apr 1, 2022 •

edited

Loading

nabenabe0928 commented Apr 5, 2022 •

edited

Loading