Skip to content

Tags: basetenlabs/TensorRT-LLM

Tags

phi-2

Toggle phi-2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Update issue templates

baseten/v0.6.1_20231208

Toggle baseten/v0.6.1_20231208's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Mistral kv quant calibration fixes (#2)

`examples/llama/hf_llama_convert.py -i
cache/baseten_v0.6.1_20231206/repos/mistralai/Mistral-7B-v0.1/1a2e76/dst
-o /tmp/tmp9frqc6v6/c5740d/dst --calibrate-kv-cache -t fp16` fails with
similar errors to smooth quant errors because numpy doesn't support
bfloat16.

baseten/v0.6.1_20231206

Toggle baseten/v0.6.1_20231206's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Mistral smooth quant fixes (#1)

* hf_llama_convert.py has many places where torch tensor is converted to
numpy and numpy doesn't support bfloat16. Explicit conversion to get
around that.
* Fix issue with loading quantized weights where the calculation looks
off.

v0.6.1

Toggle v0.6.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Update TensorRT-LLM (NVIDIA#546)

v0.6.0

Toggle v0.6.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
update aarch64 libraries (NVIDIA#525)

v0.5.0

Toggle v0.5.0's commit message
revise the homepage (NVIDIA#14)

Co-authored-by: Shi Xiaowei <xiaoweis@nvidia.com>