Skip to content

Refactor convert.py and add support for Metas official Llama 3 model #6819

Closed
@teleprint-me

Description

@teleprint-me

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Feature Description

Support the Official Llama 3 PyTorch model distributed by Meta.

Motivation

The convert.py supports converting the raw Llama 1 and 2 torch models distributed by Facebook Research Labs, but not the Llama 3 raw torch models. PR #6745 implemented the conversion process for Huggingface's transformers and tokenizers framework implementations, but not the raw torch models themselves.

There are issues conflicting with the current convert.py implementation due to feature creep based on the desire to support Huggingface's formats. These features are now blocking and interfering with the implementation for Llama 3.

Possible Implementation

The Official Llama 3 is distributed with a plaintext BPE tokenizer.model file which utilizes the GPT-2 tokenizer format distributed by OpenAI. This means it requires the use of tiktoken in order to convert the model to a compatible GGUF format.

We would need to integrate this into the BpeVocab class which solely supports HuggingFace's tokenizers at the moment.

We already have the implementation details given to us by Meta which released the official source code in their meta-llama org repo. See https://github.com/meta-llama/llama3 for more information.

The Tokenizer class implementation is already fleshed out, but needs to be refactored and integrated into the Vocab factory in a reasonable way. This is no small feat because it breaks the currently existing pattern and deviates from the previous releases as a result.

We already have support for most of these models and vocabularies are far and few inbetween, but there's enough abstractions as well as implementations that the complexity is increasing over time.

Some ideas I'm currently considering are to follow a series of steps over time to reduce the complexity, maintaince, and extension of the convert.py script over time.

This means removing any unnecessary and unrelated code from the convert.py script and migrating all HuggingFace source code to the convert-hf-to-gguf.py script. This is long term proposal that requires everyone to be on the same page in order to effectively and efficiently pull this off.

I outlined my rationale in the link above referencing PR #6745. A potentially related issue in ISS #6690.

I'm open to any feedback and suggestions here. I'm in no rush to implement this and I believe its wise we don't rush to implement this as enough technical debt has piled up. It might be better to discuss this first and determine the best steps to take before progressing forward.

@cebtenzzre @ngxson @pcuenca

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions