Release notes from llm-analysis

Bug fixes

2023-11-13T04:45:52Z

No content.

2023-11-02T17:37:34Z

No content.

2023-10-31T07:55:44Z

This release fixes a few bugs when calculating memory usage (e.g. activation, optimizer states), and adds support to analysis MoE training.

2023-08-18T06:30:37Z

This release:

adds group query attention (GQA) support
changes the activation memory calculation in inference to assume maximum tensor buffer
fixes the kv cache size calculation
adds a gpu cost analysis in the inference
adds llama2 inference case study

2023-08-18T06:25:55Z

update version

2023-05-02T17:19:27Z

No content.