tag:github.com,2008:https://github.com/cli99/llm-analysis/releasesRelease notes from llm-analysis2023-11-13T04:44:32Ztag:github.com,2008:Repository/635392069/v0.2.22023-11-13T04:45:52ZBug fixesNo content.cli99tag:github.com,2008:Repository/635392069/v0.2.12023-11-02T17:37:34Zv0.2.1No content.cli99tag:github.com,2008:Repository/635392069/v0.2.02023-10-31T07:55:44ZBug fixes and MoE training analysis support<p>This release fixes a few bugs when calculating memory usage (e.g. activation, optimizer states), and adds support to analysis MoE training.</p>cli99tag:github.com,2008:Repository/635392069/v0.1.12023-08-18T06:30:37ZBug fixes and Llama 2 inference support<p>This release:</p>
<ul>
<li>adds group query attention (GQA) support</li>
<li>changes the activation memory calculation in inference to assume maximum tensor buffer</li>
<li>fixes the kv cache size calculation</li>
<li>adds a gpu cost analysis in the inference</li>
<li>adds llama2 inference case study</li>
</ul>cli99tag:github.com,2008:Repository/635392069/push2023-08-18T06:25:55Zpush<p>update version</p>cli99tag:github.com,2008:Repository/635392069/v0.1.02023-05-02T17:19:27Zv0.1.0No content.cli99