Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate q4 quantization functions #383

Merged
merged 4 commits into from
Mar 22, 2023
Merged

Conversation

sw
Copy link
Contributor

@sw sw commented Mar 22, 2023

As suggested in #356, this de-duplicates the code in ggml_quantize_q4_0 and ggml_quantize_q4_1, which were recently moved to ggml.c

To ensure deterministic creation of model files, I introduced a new "reference" implementation for the q4_0 quantization. For q4_1 this wasn't necessary, as that has no SIMD optimizations.
This quashes @ggerganov's hope of making the quantize program faster, but I believe deterministic model files are more important.

Note that the checksum for models/7B/ggml-model-q4_0.bin is wrong in SHA256SUMS, see #374

@gjmulder gjmulder added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request labels Mar 22, 2023
@sw sw removed bug Something isn't working documentation Improvements or additions to documentation labels Mar 22, 2023
@sw sw force-pushed the quant-dedup branch 2 times, most recently from f59fe8e to e0fe526 Compare March 22, 2023 13:04
@sw
Copy link
Contributor Author

sw commented Mar 22, 2023

I've added a basic test for the quantization functions, but it is failing on macOS due to an illegal instruction. I suspect that the CMakeLists.txt selects some optimization flags that the CI machine does not support.

I've disabled the test on macOS for now as I don't have macOS and don't want to hammer this PR further with pushes.

@sw sw requested a review from ggerganov March 22, 2023 13:28
@ggerganov
Copy link
Owner

I've added a basic test for the quantization functions, but it is failing on macOS due to an illegal instruction. I suspect that the CMakeLists.txt selects some optimization flags that the CI machine does not support.

Weird, the test passes on my M1.
The test does not go through any SIMD code, so I don't see why it would cause illegal instruction.

@ggerganov ggerganov merged commit 69c9229 into ggerganov:master Mar 22, 2023
@sw sw deleted the quant-dedup branch March 23, 2023 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants