Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make use of data more by devising subsampling #426

Open
nabenabe0928 opened this issue Apr 1, 2022 · 2 comments
Open

Make use of data more by devising subsampling #426

nabenabe0928 opened this issue Apr 1, 2022 · 2 comments

Comments

@nabenabe0928
Copy link
Contributor

When we use a certain memory_allocation 1 in subsampling, we reduce the number of samples until we reach the memory limit.
However, we need to come up with an appropriate value for this as when we set it too high, the training fails due to memory error while when we set it too low, we waste memory.

For now, we circumvent this issue by measuring the memory consumption when using the default config.

Footnotes

  1. The definition of the memory_allocation is the following:
    Absolute memory in MB, e.g. 10MB is "memory_allocation": 10.
    The memory used by the dataset is checked after each reduction method is performed.
    If the dataset fits into the allocated memory, any further methods listed in "methods" will not be performed.

@nabenabe0928
Copy link
Contributor Author

nabenabe0928 commented Apr 1, 2022

The information is based on the runs on dev branch.
Those passed when I ran with example for the tabular classification.

mem_limit, N, D = 3000, 500, 10000  # 3GB
mem_limit, N, D = 4000, 2500, 10000  # 4GB
mem_limit, N, D = 5000, 4500, 10000  # 5GB
mem_limit, N, D = 6000, 6500, 10000  # 6GB
mem_limit, N, D = 7000, 8500, 10000  # 7GB
mem_limit, N, D = 8000, 11500, 10000  # 8GB
mem_limit, N, D = 9000, 15000, 10000  # 9GB

Note that since neural networks become larger when using a larger input size, I used a large fixed D and this D is determined via the feature size in automlbenchmark.

Since when we use memory_limit=2000, APT does not run, we use the following equation to calculate the memory_allocation and raise an error when we get memory_allocation < 0:

memory_allocation = (memory_limit - 3000) / 1000.0 * 160 + 40

@nabenabe0928
Copy link
Contributor Author

nabenabe0928 commented Apr 5, 2022

The information is based on the runs on common modification branch.

mem_limit, N, D = 3000, 500, 10000  # 3GB
mem_limit, N, D = 4000, 3000, 10000  # 4GB
mem_limit, N, D = 5000, 5500, 10000  # 5GB
mem_limit, N, D = 6000, 8500, 10000  # 6GB
mem_limit, N, D = 7000, 11500, 10000  # 7GB
mem_limit, N, D = 8000, 15500, 10000  # 8GB

The training ratio was 0.75, so we might need to take the 75% of those values.
Then 200MB increments will be 150MB increments and 80MB for the 3GB will 30MB.

memory_allocation = (memory_limit - 3000) / 1000.0 * 150 + 30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant