Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please support the also official Falcon-rw-1b and Falcon-rw-7b model variants #2868

Open
maddes8cht opened this issue Aug 29, 2023 · 6 comments
Labels
good first issue Good for newcomers model Model specific

Comments

@maddes8cht
Copy link
Contributor

Falcon consists not only of the versions 7b and 40b, but also of the two refined Web variants Falcon-RW-1B and Falcon-RW-7B.
These are official Versions as can be seen on https://huggingface.co/tiiuae.

I have successfully converted and quantized the 7b models with convert-falcon-hf-to-gguf.py, but the refined web variants result in the following abort messages:

python convert-falcon-hf-to-gguf.py
gguf: loading model falcon-rw-1b
Model architecture not supported: FalconForCausalLM
Basename: tiiuae-falcon-rw-1b

The message for the rw 7b model is identical except for the filename.
Do you want to support these models as well, or are there special difficulties?

A Falcon 1.3b model would be an incredible fast model for small and easy tasks. It would be great to have this model.

@ggerganov
Copy link
Owner

Should be easy to support - PRs welcome

@ggerganov ggerganov added good first issue Good for newcomers model Model specific labels Aug 29, 2023
@KerfuffleV2
Copy link
Collaborator

@maddes8cht

You could try changing:

if hparams["architectures"][0] != "RWForCausalLM":
    print("Model architecture not supported: " + hparams["architectures"][0])

    sys.exit(1)

to

if hparams["architectures"][0] not in ("RWForCausalLM", "FalconForCausalLM"):
    print("Model architecture not supported: " + hparams["architectures"][0])

    sys.exit(1)

This is assuming there are no other changes in the actual model architecture, etc.

@maddes8cht
Copy link
Contributor Author

When doing so, I'm getting the following error with the rw-1b model:

gguf: loading model falcon-rw-1b
gguf: get model metadata
Traceback (most recent call last):
  File "c:\Users\WaWiAdm\Documents\Github\ggllm.cpp.pr\convert-falcon-hf-to-gguf.py", line 96, in <module>
    block_count = hparams["n_layer"]
KeyError: 'n_layer'

With the rw-7b model it's almost the same:

gguf: loading model falcon-rw-7b
gguf: found 2 model parts
gguf: get model metadata
Traceback (most recent call last):
  File "c:\Users\WaWiAdm\Documents\Github\ggllm.cpp.pr\convert-falcon-hf-to-gguf.py", line 96, in <module>
    block_count = hparams["n_layer"]
KeyError: 'n_layer'

The original 7b Model has 32 layers,
the rw-1b only 24,
and the rw-7b actually has 36 layers ( which is more than the reglar 7b, as the regular 7b is having 6.7b parameters and the rw7b 7.5b parameters.

@KerfuffleV2
Copy link
Collaborator

When doing so, I'm getting the following error with the rw-1b model:

I guess there are actual differences. I looked more closely at the config.json and it seems like the base version uses parallel attention and multiquery while the RW one doesn't.

This is very, very unlikely to work but if you want you can try changing n_layer to num_hidden_layers on the line that failed. It will get you past that particular problem but I'd be very surprised if it could complete successfully.

@maddes8cht
Copy link
Contributor Author

The config.json in the repository is quite different between the regular falcon 7b / 40b and the RefinedWeb Veriants.
In this case there is no entry "n_layer" in the RefinedWeb variants, this information seems to be in "num_hidden_layers".

Here is config.json of rw-1b

{
  "alibi": true,
  "apply_residual_connection_post_layernorm": false,
  "architectures": [
    "FalconForCausalLM"
  ],
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "configuration_falcon.FalconConfig",
    "AutoModel": "modeling_falcon.FalconModel",
    "AutoModelForSequenceClassification": "modeling_falcon.FalconForSequenceClassification",
    "AutoModelForTokenClassification": "modeling_falcon.FalconForTokenClassification",
    "AutoModelForQuestionAnswering": "modeling_falcon.FalconForQuestionAnswering",
    "AutoModelForCausalLM": "modeling_falcon.FalconForCausalLM"
  },
  "bias": true,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_dropout": 0.0,
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "falcon",
  "multi_query": false,
  "new_decoder_architecture": false,
  "num_attention_heads": 32,
  "num_hidden_layers": 24,
  "parallel_attn": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.27.4",
  "use_cache": true,
  "vocab_size": 50304
}

and here is config.json of the regular 7b model:

{
  "alibi": false,
  "apply_residual_connection_post_layernorm": false,
  "architectures": [
    "RWForCausalLM"
  ],
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "configuration_RW.RWConfig",
    "AutoModel": "modelling_RW.RWModel",
    "AutoModelForSequenceClassification": "modelling_RW.RWForSequenceClassification",
    "AutoModelForTokenClassification": "modelling_RW.RWForTokenClassification",
    "AutoModelForQuestionAnswering": "modelling_RW.RWForQuestionAnswering",
    "AutoModelForCausalLM": "modelling_RW.RWForCausalLM"
  },
  "bias": false,
  "bos_token_id": 11,
  "eos_token_id": 11,
  "hidden_dropout": 0.0,
  "hidden_size": 4544,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "RefinedWebModel",
  "multi_query": true,
  "n_head": 71,
  "n_layer": 32,
  "parallel_attn": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.27.4",
  "use_cache": true,
  "vocab_size": 65024
}

@akawrykow
Copy link
Contributor

akawrykow commented Aug 29, 2023

Even fixing these key errors with the differences in the config, I later run into this issue:

PS C:\llama.cpp> python3 .\convert-falcon-hf-to-gguf.py .\models\falcon-rw-1b\ 1
gguf: loading model falcon-rw-1b
gguf: get model metadata
gguf: get tokenizer metadata
gguf: get gpt2 tokenizer merges
gguf: get gpt2 tokenizer vocab
gguf: get special token ids
gguf: get tensor metadata
gguf: loading model part 'pytorch_model.bin'
token_embd.weight, n_dims = 2, torch.bfloat16 --> float16
blk.0.attn_norm.weight, n_dims = 1, torch.bfloat16 --> float32
blk.0.attn_norm.bias, n_dims = 1, torch.bfloat16 --> float32
Traceback (most recent call last):
  File ".\convert-falcon-hf-to-gguf.py", line 244, in <module>
    qkv = data.view(n_head_kv, n_head // n_head_kv + 2, head_dim, head_dim * n_head)
RuntimeError: shape '[1, 34, 64, 2048]' is invalid for input of size 12582912

so seems like the shape of the weights also differs for this model

see draft pr: #2887

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers model Model specific
Projects
None yet
Development

No branches or pull requests

4 participants