Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How much inaccuracy/difference from pytorch is to be expected? #915

Closed
chavinlo opened this issue Aug 6, 2024 · 4 comments
Closed

How much inaccuracy/difference from pytorch is to be expected? #915

chavinlo opened this issue Aug 6, 2024 · 4 comments

Comments

@chavinlo
Copy link

chavinlo commented Aug 6, 2024

Hello, a few weeks ago on #883 I was told that there was to be some slight difference from pytorch expected on ggml. However, I have been further porting the model (HuBERT) into ggml, and the difference continued growing and growing, and now after 30-ish "blocks" the difference is getting a bit concerning...

On the first block, feature_extractor, which is just 7 Conv1D, 1 Group Norm, and a few inplace GeLUs, the difference is very small but still acceptable:

GGML Output:

dims: 928 512 1 1 f32
First & Last 10 elements:
0.00082 0.00127 0.00052 0.00166 0.00194 0.00228 0.00051 0.00063 0.00001 -0.00076
0.02071 -0.01569 0.00515 0.01100 -0.03571 -0.02370 -0.01015 -0.02249 -0.01920 -0.00361
sum:  -459.135949

Pytorch Output:

dims: torch.Size([1, 512, 928]) torch.float32
First & Last 10 elements:
[0.00081, 0.00132, 0.00054, 0.0017, 0.00194, 0.00229, 0.00054, 0.00068, -1e-05, -0.00075]
[0.02066, -0.01545, 0.0053, 0.01141, -0.03561, -0.02357, -0.00955, -0.02235, -0.01908, -0.00314]
sum:  -459.71112060546875

On the second block, layer_norm, which is a norm layer with bias, outputs a difference thats a bit bigger from the first block:

GGML Output:

dims: 512 928 1 1 f32
First & Last 10 elements:
-0.00148 -0.00114 -0.00298 -0.00375 -0.00304 -0.19735 -0.00158 -0.00227 0.15176 0.32282
-0.00708 -0.00236 -0.01410 0.02816 -0.00187 0.14543 -0.00118 -0.00185 -0.07328 0.10153
sum:  -11849.843736

Pytorch Output:

dims: torch.Size([1, 928, 512]) torch.float32
First & Last 10 elements:
[-0.00148, -0.00114, -0.00298, -0.00376, -0.00304, -0.2042, -0.00158, -0.00227, 0.13578, 0.31828]
[-0.00707, -0.00236, -0.0141, 0.032, -0.00187, 0.14823, -0.00118, -0.00185, -0.07328, 0.11176]
sum:  -11980.99609375

The third block, post_extract_proj, is nothing more than a linear layer with bias, but the difference once again gets bigger:

GGML Output:

dims: 768 928 1 1 f32
First & Last 10 elements:
0.23420 -0.04308 0.36195 -1.01714 -0.55842 -0.74043 -0.06722 0.16166 0.98925 0.90034
-3.91645 -0.41976 -0.30636 -0.56799 0.26271 -0.55689 0.38218 -0.14825 0.82683 -0.73491
sum:  -5578.222083

Pytorch Output:

dims: torch.Size([1, 928, 768]) torch.float32
First & Last 10 elements:
[0.25546, -0.04067, 0.3587, -1.05641, -0.57509, -0.75408, -0.07237, 0.14982, 0.99075, 0.9247]
[-3.87562, -0.4089, -0.25278, -0.5513, 0.27736, -0.55433, 0.38134, -0.13394, 0.82322, -0.71393]
sum:  -5674.5029296875

Then, on the positional embedding section of the encoder block, the outputs are very, very different:

GGML Output:

dims: 768 928 1 1 f32
First & Last 10 elements:
0.18384 0.43774 -0.01323 -0.04510 0.16553 0.17847 0.39526 0.16687 0.13611 0.01314
0.48682 0.12854 -0.14307 0.24927 -0.16003 0.05411 0.18164 -0.07831 0.01350 -0.16882
sum:  143483.242961

Pytorch Output:

dims: torch.Size([1, 928, 768]) torch.float32
First & Last 10 elements:
[-0.1591, 0.17117, -0.16889, 0.45649, 0.66979, 0.61809, 0.08895, 0.23388, 0.49129, -0.16661]
[5.05966, 1.35264, 0.26677, -0.0, -0.0, 1.63973, 0.13964, 0.02565, 0.21715, -0.04758]
sum:  519205.15625

It could be because of my implementation of 1D Conv Groups (since I was unable to find any options to use groups in conv1d in ggml), but im not sure. I managed to take a screenshot from a moment where it did return a very close value to the Pytorch output, so at least it should be possible:
devenv_GOMQQCwFul
However, I didnt save a backup and moved the code to a git repo after I found out that the pos embed part was failing.

Here is the repository: https://github.com/chavinlo/rvc.cpp
It contains:

  • The C++ GGML Inference code
  • A notebook that has a simplified HuBERT, conversion to GGUF, and layer by layer inference so that you can see the outputs of each block (up to positional embedding)
  • Links to weights for hubert both in pytorch and GGUF format in the notebook, or just https://huggingface.co/chavinlo/ggmltest

If more information is needed just lmk!

@Green-Sky
Copy link
Contributor

At least the precision of GeLU should increase soon ggerganov/llama.cpp#8878

@ggerganov
Copy link
Owner

The second block seems suspicious - make sure you use the correct epsilon value

@chavinlo
Copy link
Author

chavinlo commented Aug 7, 2024

The second block seems suspicious - make sure you use the correct epsilon value

I've checked and its set to the correct value, 0.00001 or 1e-5f. During inference, at least according to VS memory view, it says its using 9.99999975e-06, which isnt the correct value but isnt too far either.

However, I've got the second block (layer_norm) to be a bit more precise by changing the order of the ggml_add arguments from (context, bias, input) to (context, input, bias):


Old:

        if (affine) {
            output = ggml_mul(
                ctx, 
                output, 
                weight_tensor
            );
            
            output = ggml_add(
                ctx,
                ggml_repeat(ctx, bias_tensor, output),
                output
            );
        }

New:

        if (affine) {
            output = ggml_mul(
                ctx,
                ggml_repeat(ctx, weight_tensor, output),
                output
            );

            output = ggml_add(
                ctx,
                output,
                ggml_repeat(ctx, bias_tensor, output)
            );
        }

Old Output:

dims: 512 928 1 1 f32
First & Last 10 elements:
-0.00148 -0.00114 -0.00298 -0.00375 -0.00304 -0.19735 -0.00158 -0.00227 0.15176 0.32282
-0.00708 -0.00236 -0.01410 0.02816 -0.00187 0.14543 -0.00118 -0.00185 -0.07328 0.10153
sum:  -11849.843736

New Output: (after applying the same change to earlier blocks)

dims: 512 928 1 1 f32
First & Last 10 elements:
-0.00148 -0.00114 -0.00299 -0.00376 -0.00304 -0.20524 -0.00158 -0.00227 0.13591 0.31613
-0.00708 -0.00236 -0.01410 0.03157 -0.00187 0.14796 -0.00118 -0.00185 -0.07193 0.11132
sum:  -11928.556381

Pytorch Output:

dims: torch.Size([1, 928, 512]) torch.float32
First & Last 10 elements:
[-0.00148, -0.00114, -0.00298, -0.00376, -0.00304, -0.2042, -0.00158, -0.00227, 0.13578, 0.31828]
[-0.00707, -0.00236, -0.0141, 0.032, -0.00187, 0.14823, -0.00118, -0.00185, -0.07328, 0.11176]
sum:  -11980.99609375

I applied the same change in the group_norm of the feature_extractor, which is the first block, and there was a slight upgrade in accuracy as well:


Old Output:

dims: 928 512 1 1 f32
First & Last 10 elements:
0.00082 0.00127 0.00052 0.00166 0.00194 0.00228 0.00051 0.00063 0.00001 -0.00076
0.02071 -0.01569 0.00515 0.01100 -0.03571 -0.02370 -0.01015 -0.02249 -0.01920 -0.00361
sum:  -459.135949

New Output:

dims: 928 512 1 1 f32
First & Last 10 elements:
0.00082 0.00131 0.00052 0.00169 0.00193 0.00231 0.00053 0.00069 -0.00001 -0.00075
0.02075 -0.01557 0.00544 0.01132 -0.03564 -0.02357 -0.00956 -0.02246 -0.01915 -0.00317
sum:  -459.740583

Pytorch Output:

dims: torch.Size([1, 512, 928]) torch.float32
First & Last 10 elements:
[0.00081, 0.00132, 0.00054, 0.0017, 0.00194, 0.00229, 0.00054, 0.00068, -1e-05, -0.00075]
[0.02066, -0.01545, 0.0053, 0.01141, -0.03561, -0.02357, -0.00955, -0.02235, -0.01908, -0.00314]
sum:  -459.71112060546875

The third block, post_extract_proj, got benefited of the accuracy boost as well but I won't post the whole stats because it would make this reply larger than it should be but the sum is -5669.050556 which is much closer to -5674.5029296875 (pytorch) than -5578.222083 (old ggml).

Despite all these changes, the positional embedding section of the encoder block still outputs the same inaccuracy as before, same exact results, even after applying the same change on its bias. Perhaps it is the way I implemented conv 1d groups?:

// Convolutional 1D layer with groups
class grouped_conv1d : public GGMLBlock {
public:
    int stride;
    int padding;
    int dilation;
    int groups;

    grouped_conv1d(
        int stride,
        int padding,
        int dilation,
        int groups)
        : stride(stride),
        padding(padding),
        dilation(dilation),
        groups(groups) {
        has_children = true;

        for (int i = 0; i < groups; i++) {
            blocks["group" + std::to_string(i)] = std::shared_ptr<GGMLBlock>(new conv1d(stride, padding, dilation));
        }

        blocks["global_bias"] = std::shared_ptr<GGMLBlock>(new grouped_conv1d_bias());
    }

    ggml_tensor* forward(ggml_context* ctx, ggml_tensor* input) {

        auto bGroupedConvBias = std::dynamic_pointer_cast<grouped_conv1d_bias>(blocks["global_bias"]);

        ggml_tensor* output = input; // 928, 768, 1, 1 (L.in, C.in, N, -)
        ggml_tensor* groupCache = nullptr;
        ggml_tensor* currentTensor = nullptr;
        for (int i = 0; i < groups; i++) {
            auto bTmpConvBlock = std::dynamic_pointer_cast<conv1d>(blocks["group" + std::to_string(i)]);
            currentTensor = ggml_slice(ctx, output, 1, (48 * i), (48 * (i + 1))); // Chunking TODO: change 48 to ne[1] / groups as in 768 / 16 = 48

            currentTensor = bTmpConvBlock->forward(ctx, currentTensor);
            if (i == 0) {
                groupCache = currentTensor;
            }
            else {
                groupCache = ggml_concat(ctx, groupCache, currentTensor, 1);
            }
        }

        output = groupCache;
        output = bGroupedConvBias->forward(ctx, output);

        return output;
    }
};

And this is the positional embedding block:

// Positional Conv Embedding
class positional_conv_embedding : public GGMLBlock {
public:
    positional_conv_embedding() {
        has_children = true;

        blocks["conv"] = std::shared_ptr<GGMLBlock>(new grouped_conv1d(1, 64, 1, 16));
        // padding = 128 // 2 = 64
    }

    ggml_tensor* forward(ggml_context* ctx, ggml_tensor* input) {
        auto bGroupedConv = std::dynamic_pointer_cast<grouped_conv1d>(blocks["conv"]);

        ggml_tensor* output = input; // Input has shape 768, 928, 1, 1
        output = ggml_cont(ctx, ggml_transpose(ctx, output)); // 768, 928, 1, 1 -> 928, 768, 1, 1 (L.in, C.in, N, -)
        output = bGroupedConv->forward(ctx, output);
        output = ggml_slice(ctx, output, 0, 0, output->ne[0] - 1); // x[:, :, :-1] cuz output has 1 xtra channel
        output = ggml_gelu(ctx, output);
        output = ggml_cont(ctx, ggml_transpose(ctx, output)); // 928, 768, 1, 1 -> 768, 928, 1, 1
        return output;
    }
};

Sorry if this reply was longer than it should have been.

@chavinlo
Copy link
Author

chavinlo commented Aug 8, 2024

Turns out it was something in the code saving the parameters to GGUF, its something with pytorch and the fix was to use an older version of the notebook im using even though the code is the exact same... anyways, thanks for the help, I hope this is useful for someone at least...

@chavinlo chavinlo closed this as completed Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants