Open
Conversation
…anced to distribute between compute devices
Collaborator
|
@avtc Thanks again for amother gem! Can you whip up some unit tests so there is good test coverage on the diffs so I can run it ok our gpus and check for regressions. |
…ss" - filter only by compute_device_filter
Contributor
Author
|
@Qubitium I have added tests with help of GLM-5, please review if it is OK |
Collaborator
|
@avtc Will be checking and merging in the next 48 hours. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
@Qubitium Hi, this feature allows specify in config the device where calibration data inputs/outputs will be stored, allowing to use more calibration data samples for quantization, because calibration data can be placed on device different to
cuda:0which already stores all layer modules.Before the feature initial calibration data was stored on
CPUand after first pass it was stored onDEVICE_0(cuda:0 usually).After the feature if
calibration_data_deviceis not set initial behavior preserved.calibration_data_devicecan be set to "cpu", "cuda:1" (or any other torch device), and to "balanced" - in "balanced" mode calibration data distributed between compute devices available:DEVICE_0..DEVICE_NP.S. I have used this feature previously several times but on another old branch. This PR is based on latest master.
Also I have fixed examples in config file for using
moeparameter, and fixedsys.abiflagstypo which failed build.Note: the handling of layer with all modules excluded from quantization was also fixed, as current main code did not do forward replay it seems.
I have run several small tests (few first layers) ensuring nothing fail with
auto_forward_data_parallelenabled and disabled, on qwen3-30b-a3b withcalibration_data_deviceset tocpu,cuda:1,balancedand removed from config.