Inference compile cache script #504

philschmid · 2024-03-05T09:41:25Z

What does this PR do?

This PR adds a tools script to easily add LLMs to the public cache. It leverages the optimum-cli behind the scenes and checks if the version matches. There are currently 2 ways to use it

Single compilation:

huggingface-cli login --token hf_xxx # access to cache repo
python tools/cache_model_for_inference.py --hf_model_id "HuggingFaceH4/zephyr-7b-beta" --batch_size 1 --sequence_length 2048 --num_cores 2 --auto_cast_type fp16

File based compilation

python tools/auto_fill_inference_cache.py --config_file test.json

with a file

{
  "openai-community/gpt2": [
    {
      "batch_size": 1,
      "sequence_length": 1024,
      "num_cores": 1,
      "auto_cast_type": "fp16"
    }
  ],
  "meta-llama/Llama-2-7b-chat-hf": [
    {
      "batch_size": 1,
      "sequence_length": 4096,
      "num_cores": 2,
      "auto_cast_type": "fp16"
    },
    {
      "batch_size": 1,
      "sequence_length": 4096,
      "num_cores": 8,
      "auto_cast_type": "fp16"
    }
  ],

Remote file based config

python tools/auto_fill_inference_cache.py --config_file https://huggingface.co/aws-neuron/optimum-neuron-cache/raw/main/inference-cache-config/gpt2.json

The configs can be found in the aws-neuron/optimum-neuron-cache under inference-cache-config

dacorvo

Since it partially duplicates auto_fill_neuronx_inference_cache.py, could you transform this into a simple configuration JSON file for you own script ?

tools/cache_model_for_inference.py

dacorvo · 2024-03-05T13:09:24Z

tools/auto_fill_inference_cache.py

+                    num_cores=model_config["num_cores"],
+                    auto_cast_type=model_config["auto_cast_type"],
+                )
+    # Check if all arguments are provided if a config file is not used


This will fail everytime you pass a config_file without a model_id if you don't return here or use an elif

This is not about the schema. Here you will err because the user did pass a config_file that you already parsed, and you nevertheless expect them to have passed a model_id.

I think i got what you meant. I made it a hardcode "if elif else" without fall through

What I meant is that if the input was a config file, then the script should end immediately after having processed it, and not try to also parse the other arguments.

OK, so your last commit. About the schema, naaah, this is overkill.

tools/auto_fill_inference_cache.py

dacorvo

LGTM, thanks !

tools/auto_fill_inference_cache.py

philschmid · 2024-03-05T13:53:26Z

Tested with

python tools/auto_fill_inference_cache.py --config_file https://huggingface.co/aws-neuron/optimum-neuron-cache/raw/main/inference-cache-config/gpt2.json

Co-authored-by: David Corvoysier <david@huggingface.co>

inference compile script

b0e4321

philschmid requested a review from dacorvo March 5, 2024 09:41

dacorvo reviewed Mar 5, 2024

View reviewed changes

philschmid added 4 commits March 5, 2024 12:58

add files

eee0d22

make style happy

b0ca3b4

added check

52c9d2a

update style

55c1d36

philschmid requested a review from dacorvo March 5, 2024 13:04

dacorvo reviewed Mar 5, 2024

View reviewed changes

philschmid added 2 commits March 5, 2024 13:15

typo

9039a5f

make sure that __main__ works

20ac18e

dacorvo approved these changes Mar 5, 2024

View reviewed changes

philschmid added 2 commits March 5, 2024 13:51

add remote file support

be96586

style

c0b0f90

dacorvo reviewed Mar 5, 2024

View reviewed changes

tools/auto_fill_inference_cache.py Outdated Show resolved Hide resolved

Update tools/auto_fill_inference_cache.py

4a2cee8

Co-authored-by: David Corvoysier <david@huggingface.co>

dacorvo approved these changes Mar 5, 2024

View reviewed changes

philschmid merged commit 7649e6c into main Mar 5, 2024
2 checks passed

philschmid deleted the add-compile-script branch March 5, 2024 14:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference compile cache script #504

Inference compile cache script #504

philschmid commented Mar 5, 2024 •

edited

Loading

dacorvo left a comment

dacorvo Mar 5, 2024

dacorvo Mar 5, 2024 •

edited

Loading

philschmid Mar 5, 2024

dacorvo Mar 5, 2024

dacorvo Mar 5, 2024

dacorvo left a comment

philschmid commented Mar 5, 2024

Inference compile cache script #504

Inference compile cache script #504

Conversation

philschmid commented Mar 5, 2024 • edited Loading

What does this PR do?

dacorvo left a comment

Choose a reason for hiding this comment

dacorvo Mar 5, 2024

Choose a reason for hiding this comment

dacorvo Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

philschmid Mar 5, 2024

Choose a reason for hiding this comment

dacorvo Mar 5, 2024

Choose a reason for hiding this comment

dacorvo Mar 5, 2024

Choose a reason for hiding this comment

dacorvo left a comment

Choose a reason for hiding this comment

philschmid commented Mar 5, 2024

philschmid commented Mar 5, 2024 •

edited

Loading

dacorvo Mar 5, 2024 •

edited

Loading