Add LLaMA support #323

borzunov · 2023-06-08T03:19:03Z

This PR:

Abolishes the model conversion procedure. Now, models are downloaded directly from original repositories like https://huggingface.co/bigscience/bloom. Servers download only shards with blocks to be hosted, and clients download only shards with input/output embeddings and layernorms.
- BLOOM is loaded from bigscience/bloom, but we use the DHT prefix bigscience/bloom-petals for backward compatibility. Same with smaller BLOOMs and BLOOMZ.
- LLaMA can be loaded from any repo like username/llama-65b-hf, but we use the DHT prefix llama-65b-hf (without the username) to accomodate blocks from different repos (there're a few of them with minor differences, such as Llama vs. LLaMA in the class name).
Refactors the client to generalize it for multiple models. Now, we have petals.models packages that contain model-specific code (e.g. petals.models.bloom, petals.models.llama). General code (e.g. CPU-efficient LM head, p-tuning) is kept in petals.client.
Introduces WrappedLlamaBlock, DistributedLlamaConfig, DistributedLlamaForCausalLM, DistributedLlamaForSequenceClassification, and DistributedLlamaModel compatible with Petals functionality (p-tuning, adapters, etc.).
Introduces AutoDistributedConfig that automatically chooses the correct config class (DistributedLlamaConfig or DistributedBloomConfig). The refactored configs contain all model-specific info for both clients and servers.

Upgrade instructions:

Remove disk caches for blocks in old (converted) format to save disk space. That is, remove ~/.cache/petals/model--bigscience--bloom-petals and ~/.cache/petals/model--bigscience--bloomz-petals directories (if present).

Tested:

Servers hosting BLOOM and LLaMA
Clients running inference, p-tuning and adapter tuning for BLOOM and LLaMA

Expected additions to this PR:

Fix NaNs in prompt embeddings during LLaMA p-tuning
Add AutoDistributedModel, AutoDistributedModelForCausalLM, AutoDistributedModelForSequenceClassification (so that we have Colab notebooks where it's enough to replace only the model name)

Future work for other PRs:

Add log messages regarding model terms of use
Cover llama with tests
Decide on cache reordering code
Add guanaco
Add falcon-40b and falcon-40b-instruct
Update the "Host your own model" guide
Update http://health.petals.ml and http://chat.petals.ml
Upgrade example notebooks
Add speed measurements vs. llama.cpp

borzunov · 2023-06-23T00:36:28Z

.github/workflows/run-tests.yaml

        run: |
-          export HF_TAG=${{ hashFiles('setup.cfg', 'src/petals/cli/convert_model.py') }}
-          export MODEL_NAME=bloom-testing/test-bloomd-560m-$HF_TAG
+          export MODEL_NAME=bigscience/bloom-560m


Loading the entire 560m model takes only 5 sec, so I'd stick to using it.

However, it takes a little more RAM, so we run 4 servers instead of 5 below.

justheuristic · 2023-06-23T06:30:26Z

.github/workflows/run-tests.yaml

-      - name: Delete any test models older than 1 week
-        if: steps.cache-model.outputs.cache-hit != 'true'
-        run: |
-          python tests/scripts/remove_old_models.py --author bloom-testing --use_auth_token $BLOOM_TESTING_WRITE_TOKEN


let's not forget to manually delete them in a week or so

src/petals/client/from_pretrained.py

src/petals/models/bloom/config.py

justheuristic · 2023-06-23T06:37:30Z

src/petals/models/llama/block.py

+        value_states = value_states.view(batch_size * self.self_attn.num_heads, seq_length, self.self_attn.head_dim)
+        key_states = key_states.view(*value_states.shape)
+        key_states = key_states.permute(0, 2, 1)
+        return (key_states, value_states)


Might be better to allow transformer blocks to define these methods than reordering from bloom.
If you agree, it's definitely okay to do it in a separate PR.
If you don't, please explain why.

justheuristic · 2023-06-23T06:39:18Z

src/petals/server/server.py

            initial_peers=initial_peers,
            start=True,
-            num_workers=self.block_config.n_layer,
+            num_workers=self.block_config.num_hidden_layers,


sanity check: is this field guaranteed for all models or only for bloom and llama? couldn't find it

I think this field is standard because I saw models like BLOOM remapping model-specific config keys like n_layer to num_hidden_layers for compatibility (not vice versa): https://github.com/huggingface/transformers/blob/6ab045d6fe7a859ddc219cd144e638bb4d8ab2fe/src/transformers/models/bloom/configuration_bloom.py#L108

justheuristic

Wow, that's probably the largest petals PR of all time)

The new structure appears sound. The only real concern is that we hard-code bloom caching order on the backend side -- instead of allowing each block to define their own cache order in as that block's methods or some other cunning plan. If you plan to do something similar, it's perfectly okay to do that later. If not, let's quickly discuss it, i might be missing something.

Another non_urgent point is covering LLaMA with tests. We can randomly initialize make a llama-pattern 0.2B model to use it for testing

…339) Before this PR, `free_disk_space_for()` was able to remove **(a)** only entire cached revisions (= git commits/branches) and **(b)** only from the repository we're loading right now. This PR allows this functions to remove arbitrary files separately from any repositories. This is useful for transition to Petals 1.2.0+, since it now uses original repos instead of the ones with converted models (see #323). In particular, the cache for `bigscience/bloom-petals` is now deprecated and should be removed in favor of `bigscience/bloom`. This is also useful as a way to free space before loading LoRA adapters (#335).

borzunov changed the title ~~Support loading LLaMA and BLOOM blocks from existing repos~~ Add LLaMA support Jun 8, 2023

Support loading LLaMA and BLOOM blocks from existing repos

9019244

borzunov force-pushed the llama branch from 1ef0656 to 9019244 Compare June 8, 2023 03:19

borzunov mentioned this pull request Jun 8, 2023

Support saving and loading 8-bit block weights #273

Open

borzunov added 2 commits June 8, 2023 03:34

Extract model_specs.py

d0d9779

Fix using free_disk_space_for()

d0a6a00

borzunov force-pushed the llama branch 5 times, most recently from b094f17 to 89aba9d Compare June 8, 2023 16:38

Support blocks sharded across multiple files

bb674d5

borzunov force-pushed the llama branch 9 times, most recently from 7a4e801 to 6367fb8 Compare June 10, 2023 02:47

Update WrappedLlamaBlock

c5fed1a

borzunov force-pushed the llama branch from 6367fb8 to c5fed1a Compare June 10, 2023 02:56

borzunov added 7 commits June 14, 2023 10:09

Download only shards for local embeddings

865bf84

Extract DistributedPretrainedConfig

be2c1b3

Refactor petals.client.remote_model and petals.bloom.modeling_utils

1068620

Refactor configs, petals.client, petals.bloom

2502a77

Remove convert_model.py

5efb9ee

Merge remote-tracking branch 'origin/main' into llama

9b7066b

Remove deprecated test from test_block_exact_match.py

1e2bf09

borzunov added 6 commits June 19, 2023 14:37

Manual merge with "Determine block dtype in a unified manner"

8a8c427

Merge commit 'c839173' into llama

bab891a

Merge remote-tracking branch 'origin/main' into llama

22cdd48

Fix logging num_blocks

7d9add8

Export llama models to petals.* namespace

7c2b0dd

Fix bugs in petals.models.llama

dedf09e

borzunov force-pushed the llama branch from 2a409c9 to c1bcfbf Compare June 22, 2023 12:18

Fix petals.client, petals.models

053abeb

borzunov force-pushed the llama branch from c1bcfbf to 053abeb Compare June 22, 2023 12:20

borzunov added 2 commits June 22, 2023 12:39

Set version to 1.2.0.dev0

d8ce452

Use backward-compatible DHT prefixes and repo names for BLOOM

ba594ec

borzunov force-pushed the llama branch from cff576a to ba594ec Compare June 22, 2023 15:34

borzunov added 4 commits June 22, 2023 15:45

Try returning the 5th server in CI

f74eed6

Minimize diff in run-tests.yaml

80d86f3

Fix PTuneMixin._keys_to_ignore_on_load_missing

6c208ce

Use less servers in tests to avoid OOMs

6acfa18

borzunov commented Jun 23, 2023

View reviewed changes

borzunov requested a review from justheuristic June 23, 2023 00:48

borzunov marked this pull request as ready for review June 23, 2023 00:48

justheuristic reviewed Jun 23, 2023

View reviewed changes

src/petals/client/from_pretrained.py Outdated Show resolved Hide resolved

justheuristic reviewed Jun 23, 2023

View reviewed changes

src/petals/models/bloom/config.py Outdated Show resolved Hide resolved

justheuristic reviewed Jun 23, 2023

View reviewed changes

justheuristic approved these changes Jun 23, 2023

View reviewed changes

Fix review comments

33e647f

borzunov merged commit cb3f018 into main Jun 23, 2023

borzunov deleted the llama branch June 23, 2023 11:46

borzunov mentioned this pull request Jul 5, 2023

Allow free_disk_space_for() remove arbitrary files from Petals cache #339

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add LLaMA support #323

Add LLaMA support #323

Uh oh!

borzunov commented Jun 8, 2023 •

edited

Loading

Uh oh!

borzunov Jun 23, 2023

Uh oh!

justheuristic Jun 23, 2023

Uh oh!

Uh oh!

Uh oh!

justheuristic Jun 23, 2023

Uh oh!

justheuristic Jun 23, 2023

Uh oh!

borzunov Jun 23, 2023

Uh oh!

justheuristic left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add LLaMA support #323

Add LLaMA support #323

Uh oh!

Conversation

borzunov commented Jun 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

borzunov Jun 23, 2023

Choose a reason for hiding this comment

Uh oh!

justheuristic Jun 23, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

justheuristic Jun 23, 2023

Choose a reason for hiding this comment

Uh oh!

justheuristic Jun 23, 2023

Choose a reason for hiding this comment

Uh oh!

borzunov Jun 23, 2023

Choose a reason for hiding this comment

Uh oh!

justheuristic left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

borzunov commented Jun 8, 2023 •

edited

Loading

justheuristic left a comment •

edited

Loading