-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find spec/node_type of Kepler node for model selection #231
Comments
Now, working on adding a simple logic on estimator to discover a core number and find the candidate models that built by the machine with the same number of cores. If not exists, list the candidates that have the largest number of cores. The change needed is the ModelRequest to also add |
@vimalk78 @sthaha Let's summarize and discuss about design here. Objective: Usecase: Who sends the model request:
|
Who generates the spec (for BM case):
How to pass the spec file (for VM case): volumeMounts:
- name: config-machine
mountPath: /etc/kepler/models/machine
readOnly: true
volumes:
- name: config-models
configMap:
name: kepler-machine-spec
items:
- key: m5.metal
path: spec.json |
We don't use sidecar estimator any more |
What would you like to be added?
Flow to link Kepler-deploying node specification to model selection from Kepler model DB.
Why is this needed?
Problem description
As previously, we have only a single node_type in the pipeline. We always put _1 after the trainer name to get the model name. However, with SPECPower and AWS instances, we can now train multiple node_type.
Currently, we have a function generate_spec to generate machine spec implemented in python on kepler-model-server.
Idea
The thing to do is to let Kepler determine know its node_type.
The logic of generate_spec may not need to merge into inside Kepler.
It can run in init container to generate spec and save to a file to mount. Server API may need to update to allow adding machine spec inside the request to select the model.
Note that,
node_type_index.json
inside the pipeline folder.The text was updated successfully, but these errors were encountered: