Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference compile cache script #504

Merged
merged 10 commits into from
Mar 5, 2024
Merged

Inference compile cache script #504

merged 10 commits into from
Mar 5, 2024

Conversation

philschmid
Copy link
Contributor

@philschmid philschmid commented Mar 5, 2024

What does this PR do?

This PR adds a tools script to easily add LLMs to the public cache. It leverages the optimum-cli behind the scenes and checks if the version matches. There are currently 2 ways to use it

Single compilation:

huggingface-cli login --token hf_xxx # access to cache repo
python tools/cache_model_for_inference.py --hf_model_id "HuggingFaceH4/zephyr-7b-beta" --batch_size 1 --sequence_length 2048 --num_cores 2 --auto_cast_type fp16

File based compilation

python tools/auto_fill_inference_cache.py --config_file test.json

with a file

{
  "openai-community/gpt2": [
    {
      "batch_size": 1,
      "sequence_length": 1024,
      "num_cores": 1,
      "auto_cast_type": "fp16"
    }
  ],
  "meta-llama/Llama-2-7b-chat-hf": [
    {
      "batch_size": 1,
      "sequence_length": 4096,
      "num_cores": 2,
      "auto_cast_type": "fp16"
    },
    {
      "batch_size": 1,
      "sequence_length": 4096,
      "num_cores": 8,
      "auto_cast_type": "fp16"
    }
  ],

Remote file based config

python tools/auto_fill_inference_cache.py --config_file https://huggingface.co/aws-neuron/optimum-neuron-cache/raw/main/inference-cache-config/gpt2.json

The configs can be found in the aws-neuron/optimum-neuron-cache under inference-cache-config

@philschmid philschmid requested a review from dacorvo March 5, 2024 09:41
Copy link
Collaborator

@dacorvo dacorvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it partially duplicates auto_fill_neuronx_inference_cache.py, could you transform this into a simple configuration JSON file for you own script ?

tools/cache_model_for_inference.py Outdated Show resolved Hide resolved
tools/cache_model_for_inference.py Outdated Show resolved Hide resolved
tools/cache_model_for_inference.py Outdated Show resolved Hide resolved
tools/cache_model_for_inference.py Outdated Show resolved Hide resolved
tools/cache_model_for_inference.py Outdated Show resolved Hide resolved
@philschmid philschmid requested a review from dacorvo March 5, 2024 13:04
num_cores=model_config["num_cores"],
auto_cast_type=model_config["auto_cast_type"],
)
# Check if all arguments are provided if a config file is not used
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail everytime you pass a config_file without a model_id if you don't return here or use an elif

Copy link
Collaborator

@dacorvo dacorvo Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not about the schema. Here you will err because the user did pass a config_file that you already parsed, and you nevertheless expect them to have passed a model_id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i got what you meant. I made it a hardcode "if elif else" without fall through

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that if the input was a config file, then the script should end immediately after having processed it, and not try to also parse the other arguments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so your last commit. About the schema, naaah, this is overkill.

tools/auto_fill_inference_cache.py Outdated Show resolved Hide resolved
tools/auto_fill_inference_cache.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@dacorvo dacorvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks !

@philschmid
Copy link
Contributor Author

Tested with

python tools/auto_fill_inference_cache.py --config_file https://huggingface.co/aws-neuron/optimum-neuron-cache/raw/main/inference-cache-config/gpt2.json

Co-authored-by: David Corvoysier <david@huggingface.co>
@philschmid philschmid merged commit 7649e6c into main Mar 5, 2024
2 checks passed
@philschmid philschmid deleted the add-compile-script branch March 5, 2024 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants