-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How much inaccuracy/difference from pytorch is to be expected? #915
Comments
At least the precision of GeLU should increase soon ggerganov/llama.cpp#8878 |
The second block seems suspicious - make sure you use the correct epsilon value |
I've checked and its set to the correct value, However, I've got the second block ( Old: if (affine) {
output = ggml_mul(
ctx,
output,
weight_tensor
);
output = ggml_add(
ctx,
ggml_repeat(ctx, bias_tensor, output),
output
);
} New: if (affine) {
output = ggml_mul(
ctx,
ggml_repeat(ctx, weight_tensor, output),
output
);
output = ggml_add(
ctx,
output,
ggml_repeat(ctx, bias_tensor, output)
);
} Old Output:
New Output: (after applying the same change to earlier blocks)
Pytorch Output:
I applied the same change in the Old Output:
New Output:
Pytorch Output:
The third block, Despite all these changes, the positional embedding section of the encoder block still outputs the same inaccuracy as before, same exact results, even after applying the same change on its bias. Perhaps it is the way I implemented conv 1d groups?: // Convolutional 1D layer with groups
class grouped_conv1d : public GGMLBlock {
public:
int stride;
int padding;
int dilation;
int groups;
grouped_conv1d(
int stride,
int padding,
int dilation,
int groups)
: stride(stride),
padding(padding),
dilation(dilation),
groups(groups) {
has_children = true;
for (int i = 0; i < groups; i++) {
blocks["group" + std::to_string(i)] = std::shared_ptr<GGMLBlock>(new conv1d(stride, padding, dilation));
}
blocks["global_bias"] = std::shared_ptr<GGMLBlock>(new grouped_conv1d_bias());
}
ggml_tensor* forward(ggml_context* ctx, ggml_tensor* input) {
auto bGroupedConvBias = std::dynamic_pointer_cast<grouped_conv1d_bias>(blocks["global_bias"]);
ggml_tensor* output = input; // 928, 768, 1, 1 (L.in, C.in, N, -)
ggml_tensor* groupCache = nullptr;
ggml_tensor* currentTensor = nullptr;
for (int i = 0; i < groups; i++) {
auto bTmpConvBlock = std::dynamic_pointer_cast<conv1d>(blocks["group" + std::to_string(i)]);
currentTensor = ggml_slice(ctx, output, 1, (48 * i), (48 * (i + 1))); // Chunking TODO: change 48 to ne[1] / groups as in 768 / 16 = 48
currentTensor = bTmpConvBlock->forward(ctx, currentTensor);
if (i == 0) {
groupCache = currentTensor;
}
else {
groupCache = ggml_concat(ctx, groupCache, currentTensor, 1);
}
}
output = groupCache;
output = bGroupedConvBias->forward(ctx, output);
return output;
}
}; And this is the positional embedding block: // Positional Conv Embedding
class positional_conv_embedding : public GGMLBlock {
public:
positional_conv_embedding() {
has_children = true;
blocks["conv"] = std::shared_ptr<GGMLBlock>(new grouped_conv1d(1, 64, 1, 16));
// padding = 128 // 2 = 64
}
ggml_tensor* forward(ggml_context* ctx, ggml_tensor* input) {
auto bGroupedConv = std::dynamic_pointer_cast<grouped_conv1d>(blocks["conv"]);
ggml_tensor* output = input; // Input has shape 768, 928, 1, 1
output = ggml_cont(ctx, ggml_transpose(ctx, output)); // 768, 928, 1, 1 -> 928, 768, 1, 1 (L.in, C.in, N, -)
output = bGroupedConv->forward(ctx, output);
output = ggml_slice(ctx, output, 0, 0, output->ne[0] - 1); // x[:, :, :-1] cuz output has 1 xtra channel
output = ggml_gelu(ctx, output);
output = ggml_cont(ctx, ggml_transpose(ctx, output)); // 928, 768, 1, 1 -> 768, 928, 1, 1
return output;
}
}; Sorry if this reply was longer than it should have been. |
Turns out it was something in the code saving the parameters to GGUF, its something with pytorch and the fix was to use an older version of the notebook im using even though the code is the exact same... anyways, thanks for the help, I hope this is useful for someone at least... |
Hello, a few weeks ago on #883 I was told that there was to be some slight difference from pytorch expected on ggml. However, I have been further porting the model (HuBERT) into ggml, and the difference continued growing and growing, and now after 30-ish "blocks" the difference is getting a bit concerning...
On the first block,
feature_extractor
, which is just 7 Conv1D, 1 Group Norm, and a few inplace GeLUs, the difference is very small but still acceptable:GGML Output:
Pytorch Output:
On the second block,
layer_norm
, which is a norm layer with bias, outputs a difference thats a bit bigger from the first block:GGML Output:
Pytorch Output:
The third block,
post_extract_proj
, is nothing more than a linear layer with bias, but the difference once again gets bigger:GGML Output:
Pytorch Output:
Then, on the positional embedding section of the encoder block, the outputs are very, very different:
GGML Output:
Pytorch Output:
It could be because of my implementation of 1D Conv Groups (since I was unable to find any options to use groups in conv1d in ggml), but im not sure. I managed to take a screenshot from a moment where it did return a very close value to the Pytorch output, so at least it should be possible:
However, I didnt save a backup and moved the code to a git repo after I found out that the pos embed part was failing.
Here is the repository: https://github.com/chavinlo/rvc.cpp
It contains:
If more information is needed just lmk!
The text was updated successfully, but these errors were encountered: