Refactor convert.py and add support for Metas official Llama 3 model

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Feature Description

Support the Official Llama 3 PyTorch model distributed by Meta.

# Motivation

The `convert.py` supports converting the raw Llama 1 and 2 `torch` models distributed by Facebook Research Labs, but not the Llama 3 raw `torch` models. PR #6745 implemented the conversion process for Huggingface's `transformers` and `tokenizers` framework implementations, but not the raw `torch` models themselves.

There are issues conflicting with the current `convert.py` implementation due to feature creep based on the desire to support Huggingface's formats. [These features are now blocking and interfering with the implementation for Llama 3](https://github.com/ggerganov/llama.cpp/pull/6745#issuecomment-2065439769).

# Possible Implementation

The Official Llama 3 is distributed with a plaintext BPE `tokenizer.model` file which utilizes the GPT-2 tokenizer format distributed by OpenAI. This means it requires the use of `tiktoken` in order to convert the model to a compatible GGUF format.

We would need to integrate this into the `BpeVocab` class which solely supports HuggingFace's tokenizers at the moment.

We already have the implementation details given to us by Meta which released the official source code in their meta-llama org repo. See https://github.com/meta-llama/llama3 for more information.

The [`Tokenizer`](https://github.com/meta-llama/llama3/blob/main/llama/tokenizer.py#L38) class implementation is already fleshed out, but needs to be refactored and integrated into the `Vocab` factory in a reasonable way. This is no small feat because it breaks the currently existing pattern and deviates from the previous releases as a result.

We already have support for most of these models and vocabularies are far and few inbetween, but there's enough abstractions as well as implementations that the complexity is increasing over time.

Some ideas I'm currently considering are to follow a series of steps over time to reduce the complexity, maintaince, and extension of the `convert.py` script over time. 

This means removing any unnecessary and unrelated code from the `convert.py` script and migrating all HuggingFace source code to the `convert-hf-to-gguf.py` script. This is long term proposal that requires everyone to be on the same page in order to effectively and efficiently pull this off. 

I outlined my rationale in the link above referencing PR #6745. A potentially related issue in ISS #6690.

I'm open to any feedback and suggestions here. I'm in no rush to implement this and I believe its wise we don't rush to implement this as enough technical debt has piled up. It might be better to discuss this first and determine the best steps to take before progressing forward.

@cebtenzzre @ngxson @pcuenca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor convert.py and add support for Metas official Llama 3 model #6819

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor convert.py and add support for Metas official Llama 3 model #6819

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions