Open
Description
Describe the issue
Hello,
What is the best method to quantize a BERT model in int4 using ipex?
For example ipex int8 from the docs is:
qconfig = ipex.quantization.default_dynamic_qconfig
prepared_model = prepare(model, qconfig, example_inputs=data)
converted_model = convert(prepared_model)
with torch.no_grad():
traced_model = torch.jit.trace(converted_model, data, check_trace=False, strict=False)
traced_model = torch.jit.freeze(traced_model)
traced_model.save("int8_quantized_model.pt")
How should this be done for 4bit?
Thank you,
Hank