-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implemention of lqlora #8820
base: develop
Are you sure you want to change the base?
implemention of lqlora #8820
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8820 +/- ##
===========================================
+ Coverage 52.99% 53.03% +0.03%
===========================================
Files 671 658 -13
Lines 109835 106606 -3229
===========================================
- Hits 58212 56543 -1669
+ Misses 51623 50063 -1560 ☔ View full report in Codecov by Sentry. |
llm/tools/get_lqlora_state_dict.py
Outdated
|
||
def get_lqlora_state_dict(): | ||
args = parse_arguments() | ||
model = AutoModelForCausalLM.from_pretrained(args.model_name_or_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不应该固定dtype的类型,参考run_fintune.py,建议不要用to(dtype),使用from_pretrained(dtype=dtype)
llm/tools/get_lqlora_state_dict.py
Outdated
target_modules = get_lora_target_modules(model) | ||
lora_config = LoRAConfig( | ||
target_modules=target_modules, | ||
r=8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?为什么把config写死,参考run_finetune.py
|
||
state_dict = model.state_dict() | ||
paddle.save(state_dict, args.output_path) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没看懂这个脚本想要干什么?初始化lqlora为什么要单独写一个脚本,存储这个权重?
llm/tools/get_lqlora_quantize_cfg.py
Outdated
|
||
model = AutoModelForCausalLM.from_pretrained(args.model_name_or_path) | ||
for name, submodule in model.named_sublayers(): | ||
if "_proj" in name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用_proj筛选的方式不适用于所有模型,建议用lora_target_module和判断是不是nn.linear的方式
qconfigs = ilp_data["qconfigs"] | ||
|
||
normalized_costs = costs / paddle.linalg.norm(costs) * 1000.0 | ||
normalized_budget = args.budget / GIGABYTES * num_params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里budget的对应含义是什么
normalized_budget = args.budget / GIGABYTES * num_params | ||
normalized_weights = weights / GIGABYTES | ||
assignments_cost, assignments = compute_qconfig_assignments( | ||
budget=normalized_budget, costs=normalized_costs, weights=normalized_weights, num_chunks=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你这个num_chunks设为1不就是一个参数搜索了一次这意义是什么?
] | ||
): | ||
raise ValueError | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其实我是建议不要把lqlora lora_A和lora_初始化和搜索quant_algo的功能写成一个脚本的形式,然后保存state_dict,可以考虑写在loramodel的初始化中,具体可以PEFT中loftq的写法https://github.com/huggingface/peft/blob/8f3970865079ca1ca1a406cc9f3b3870d677dfb4/src/peft/utils/loftq_utils.py#L333
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
具体来说就是首先模型加载的时候,加载的还是一个16bit参数的模型,然后在LoRAConfig中设置一个lqconfig用于参数传入。如果只是单纯的lqlora的话,用就把原先的nn.linear替换为对应quant_algo的quantizationLoRALinear;如果加入搜索,那么先进行搜索,得到每个层对应的quant_algo,然后再进行quantizationLoRALinear提换。设计的时候要考虑初始化、保存、热启、参数合并的场景
@@ -564,6 +564,8 @@ def __init__(self, **kwargs): | |||
if "quantization_config" in kwargs and isinstance(kwargs["quantization_config"], Dict): | |||
kwargs["quantization_config"] = QuantizationConfig.from_dict(kwargs["quantization_config"]) | |||
self.quantization_config = kwargs.pop("quantization_config", QuantizationConfig()) | |||
self.lqlora_quantize_cfg = kwargs.pop("lqlora_quantize_cfg", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块from_pretrained通过指定不同层不同量化策略的逻辑也可以保留,但就不要和loftq绑定了,可以新增model_config.quantization_config.quantize_cfg,指定不同层的量化策略
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transformers这里的逻辑和上面loftq是分离开的,建议这部分另起一个PR
This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动,被标记为stale。 |
使用paddle中LoRA支持的量化算法进行混合量化。
使用LQ-LoRA,需要先获取整数线性规划的准备数据,接着使用整数线性规划求解每个矩阵对应的量化算法,然后进行迭代初始化获取修改后的原模型及LoRA模块的参数。
在使用LQ-LoRA进行微调时,将“weight_quantize_algo”设置为“lqlora”,同时提供“lqlora_quantize_cfg”以及“lqlora_state_dict”(分别对应每个矩阵对应的量化算法、迭代初始化后的原模型及LoRA模块的参数)。