Tags · basetenlabs/TensorRT-LLM

phi-2

Update issue templates

Jan 3, 2024
6cc5e17
zip
tar.gz

baseten/v0.6.1_20231208

Mistral kv quant calibration fixes (#2)

`examples/llama/hf_llama_convert.py -i
cache/baseten_v0.6.1_20231206/repos/mistralai/Mistral-7B-v0.1/1a2e76/dst
-o /tmp/tmp9frqc6v6/c5740d/dst --calibrate-kv-cache -t fp16` fails with
similar errors to smooth quant errors because numpy doesn't support
bfloat16.

Dec 8, 2023
ad81e4a
zip
tar.gz

baseten/v0.6.1_20231206

Mistral smooth quant fixes (#1)

* hf_llama_convert.py has many places where torch tensor is converted to
numpy and numpy doesn't support bfloat16. Explicit conversion to get
around that.
* Fix issue with loading quantized weights where the calculation looks
off.

Dec 6, 2023
531ac62
zip
tar.gz

v0.6.1

Update TensorRT-LLM (NVIDIA#546)

Dec 4, 2023
9b3e12d
zip
tar.gz

v0.6.0

update aarch64 libraries (NVIDIA#525)

Dec 1, 2023
119e216
zip
tar.gz

v0.5.0

revise the homepage (NVIDIA#14)

Co-authored-by: Shi Xiaowei <xiaoweis@nvidia.com>

Oct 19, 2023
ffd5af3
zip
tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

phi-2

baseten/v0.6.1_20231208

baseten/v0.6.1_20231206

v0.6.1

v0.6.0

v0.5.0

Tags: basetenlabs/TensorRT-LLM

phi-2

baseten/v0.6.1_20231208

baseten/v0.6.1_20231206

v0.6.1

v0.6.0

v0.5.0