docs: update use llama-server instead #34

swarnimarun · 2024-04-29T10:08:19Z

I created llama-server image over the weekend, it's very small, simple and entirely static.

CPU image is, 68MB compressed and 250MB uncompressed.
GPU CUDA image is, 1.5GB compressed and 4GB uncompressed. (depending on platform)

There is also a test intel image which is very huge but supports additional optimizations for Intel devices GPUs and CPUs both.

No python or interpreted languages here, also it uses the .cache directory, as long as its volume mounted you can easily cache it.

I created llama-server image over the weekend, it's very small, simple and entirely static. No python or interpreted lanugages here, also it uses the .cache dir, as long as it's volume mounted you can easily cache it.

docs/guides/langchain.md

swarnimarun self-assigned this Apr 29, 2024

docs: update use llama-server instead

50c49f5

I created llama-server image over the weekend, it's very small, simple and entirely static. No python or interpreted lanugages here, also it uses the .cache dir, as long as it's volume mounted you can easily cache it.

richiejp reviewed Apr 29, 2024

View reviewed changes

docs/guides/langchain.md Outdated Show resolved Hide resolved

swarnimarun commented May 6, 2024

View reviewed changes

docs/guides/langchain.md Outdated Show resolved Hide resolved

docs: explain issues with low RAM

6d5be25

swarnimarun merged commit ee76147 into premAI-io:main May 6, 2024
5 checks passed

swarnimarun deleted the docs-fix branch May 6, 2024 05:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: update use llama-server instead #34

docs: update use llama-server instead #34

swarnimarun commented Apr 29, 2024

docs: update use llama-server instead #34

docs: update use llama-server instead #34

Conversation

swarnimarun commented Apr 29, 2024