Add support for separate bias tensors

### 🚀 The feature, motivation and pitch

In the `transformers` implementation of llama, there are optional `bias` tensors for the [LlamaMLP](https://github.com/pytorch/torchchat/issues/1041) and [LlamaAttention](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L299) modules. Several additional models (specifically Granite Code 3B and 8B) use the `llama` architecture and have these separate bias tensors.

The proposal here is to add the ability to indicate the presence of bias tensors in [TransformerArgs](https://github.com/pytorch/torchchat/blob/main/torchchat/model.py#L260) and then support loading them in [Attention](https://github.com/pytorch/torchchat/blob/main/torchchat/model.py#L726) and [FeedForward](https://github.com/pytorch/torchchat/blob/main/torchchat/model.py#L852)

### Alternatives

If this project is designed to be limited to official Llama models, these bias tensors are not needed.

### Additional context

This issue is a piece of the puzzle for adding support for Granite Code 3b/8b which use the llama architecture in transormers, but take advantage several pieces of the architecture that are not currently supported by torchchat. The work-in-progress for Granite Code can be found on my fork: https://github.com/gabe-l-hart/torchchat/tree/GraniteCodeSupport

### RFC (Optional)

I have a working implementation to support these optional bias tensors that I plan to submit as a PR. The changes are along the following lines:

* Add new parameters to `TransformerArgs` for attention and ffn bias
* Set the `bias` value based on these parameters in both the `Attention` and `FeedForward` modules
* Support mapping `.bias` tensor names in `convert_hf_checkpoint`
* Support permuting `.bias` tensors in `convert_hf_checkpoint`
* Support loading permuted `.bias` tensors in `model.py`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for separate bias tensors #1250

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support for separate bias tensors #1250

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions