-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference compile cache script #504
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it partially duplicates auto_fill_neuronx_inference_cache.py
, could you transform this into a simple configuration JSON file for you own script ?
tools/auto_fill_inference_cache.py
Outdated
num_cores=model_config["num_cores"], | ||
auto_cast_type=model_config["auto_cast_type"], | ||
) | ||
# Check if all arguments are provided if a config file is not used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail everytime you pass a config_file without a model_id if you don't return
here or use an elif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not about the schema. Here you will err because the user did pass a config_file that you already parsed, and you nevertheless expect them to have passed a model_id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i got what you meant. I made it a hardcode "if elif else" without fall through
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is that if the input was a config file, then the script should end immediately after having processed it, and not try to also parse the other arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so your last commit. About the schema, naaah, this is overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks !
Tested with python tools/auto_fill_inference_cache.py --config_file https://huggingface.co/aws-neuron/optimum-neuron-cache/raw/main/inference-cache-config/gpt2.json |
Co-authored-by: David Corvoysier <david@huggingface.co>
What does this PR do?
This PR adds a
tools
script to easily add LLMs to the public cache. It leverages the optimum-cli behind the scenes and checks if the version matches. There are currently 2 ways to use itSingle compilation:
File based compilation
with a file
Remote file based config
The configs can be found in the
aws-neuron/optimum-neuron-cache
under inference-cache-config