Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce enum llama_ftype #709

Merged
merged 1 commit into from
Apr 11, 2023
Merged

Introduce enum llama_ftype #709

merged 1 commit into from
Apr 11, 2023

Conversation

sw
Copy link
Contributor

@sw sw commented Apr 2, 2023

As discussed in #678, this introduces enum llama_ftype to llama.h which represents the various file types used by llama.cpp

The goal is to improve maintainability of the code by not having integer literals in various places. I hope I didn't miss one.

I used the names and comments from @comex's pending changes to the Python scripts in #545. I have not touched the scripts so as not to cause conflicts with their work.

@sw sw requested a review from ggerganov April 2, 2023 11:52
llama.cpp Outdated
@@ -100,7 +102,7 @@ struct llama_hparams {
int32_t n_head = 32;
int32_t n_layer = 32;
int32_t n_rot = 64;
int32_t f16 = 1;
int32_t f16 = LLAMA_FTYPE_MOSTLY_F16;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How hard is it to change name of this field? This is a private struct, so should be easy, no?

Copy link
Contributor Author

@sw sw Apr 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be a problem; I just didn't touch any variable names because I didn't know the intended meaning e.g. of the i in itype. Shall I rename f16 to ftype and print a user-friendly string?

fprintf(stderr, "%s: ftype   = %s\n", __func__, ftype_str[hparams.ftype]);

Hopefully no-one tries to parse the stderr outputs, otherwise this might break that. I might do the same for type 3 lines further down.

@comex
Copy link
Contributor

comex commented Apr 2, 2023

Thumbs-up from me on adding names. But I’d argue that the file type field (f16 in the file header) and tensor type field (ftype in the tensor header) should be two different enums. For example, file type 3 represents what I called ‘mostly’ Q4_1, but tensor type 3 means that that particular tensor is (entirely) Q4_1. A file with file type 3 has a mix of tensor types 0 (F32) and 3 (Q4_1). I made this dissonance somewhat worse by adding file type 4, which has no tensor type equivalent; those files have a mix of tensor types 0 (F32), 1 (F16), and 3 (Q4_1).

In the long run I’d like to see llama.cpp scrap the file type field entirely in favor of just using the tensor type and size fields to initialize the ggml tensor metadata.

@sw
Copy link
Contributor Author

sw commented Apr 2, 2023

@comex :

But I’d argue that the file type field (f16 in the file header) and tensor type field (ftype in the tensor header) should be two different enums.

You're right. So actually ftype should be renamed to ttype or some such, and f16 should be renamed to ftype. No wonder I was confused...

So there would be another enum, but should that be public?

    enum llama_ttype {
        LLAMA_TTYPE_F32  = 0,
        LLAMA_TTYPE_F16  = 1,
        LLAMA_TTYPE_Q4_0 = 2,
        LLAMA_TTYPE_Q4_1 = 3,
    };

We might want to make enum ggml_type match this, assuming that one's not written to the file. Which it isn't AFAICT.

I'm not sure about the naming - the t and f are too visually similar, so maybe we should write it out.

@comex
Copy link
Contributor

comex commented Apr 2, 2023

We might want to make enum ggml_type match this, assuming that one's not written to the file. Which it isn't AFAICT.

It's not written to the file, so that might be a good idea. No strong opinions.

@sw
Copy link
Contributor Author

sw commented Apr 5, 2023

@comex: Will this conflict with your plans in #711 (comment) ? You said you want to get rid of f16, I think that's a good idea, so I might close this to avoid conflicts.

@comex
Copy link
Contributor

comex commented Apr 5, 2023

Even after my change makes the core loading path not care about file type, there will still be some uses, like in the quantization code (which has to accept a file type as an argument and then write it to the file header). So a change like this would still be useful, and I’ll deal with any merge conflicts. However, I still wouldn’t recommend merging this as-is, for the reason I mentioned before.

@sw sw marked this pull request as draft April 5, 2023 17:29
@sw
Copy link
Contributor Author

sw commented Apr 5, 2023

I decided to just use ggml_type and change that to match the numbering in the files. I added initializer designations to the arrays in ggml.c that didn't already have them. This should avoid mixups in the future.

@sw sw marked this pull request as ready for review April 5, 2023 20:27
@sw sw requested review from prusnak and ggerganov April 6, 2023 08:15
@ggerganov ggerganov added the high priority Very important issue label Apr 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority Very important issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants