MSQuant

Model Quantization Tool with NiceGUI interface for AWQ, NVFP4, and GGUF quantization methods.

Features

Quantization Methods
- AWQ (Activation-aware Weight Quantization): 4-bit integer quantization
- NVFP4 (NVIDIA FP4): 4-bit floating-point quantization
- GGUF (GGML Universal File): Multiple quantization levels (Q4_K_M, Q5_K_M, Q6_K, Q8_0, etc.) using llama.cpp
Web Interface
- Real-time GPU monitoring with visual charts (Highcharts)
- Streaming logs during quantization
- Robust job cancellation (terminates subprocess and all children)
- Easy configuration forms
- Output model management
Infrastructure
- Docker support with NVIDIA GPU runtime
- CI/CD with GitHub Actions
- Local development with Pixi

Requirements

Python 3.10+
NVIDIA GPU with CUDA support (for quantization)
Docker with NVIDIA runtime (for containerized deployment)

Quick Start

Local Development (CPU only, UI testing)

# Install Pixi (if not already installed)
curl -fsSL https://pixi.sh/install.sh | bash

# Run the application
pixi run dev

Visit http://localhost:8080

Docker Deployment (with GPU)

cd docker
docker compose up --build

Visit http://localhost:8080

Configuration

Environment variables:

HF_HOME: HuggingFace cache directory (default: /workspace/hf)
HF_DATASETS_CACHE: Datasets cache directory (default: /workspace/hf/datasets)
OUT_DIR: Output directory for quantized models (default: /workspace/out)
PORT: Application port (default: 8080)

Project Structure

msquant/
├── src/
│   └── msquant/
│       ├── app/                # NiceGUI application
│       │   ├── main.py        # Application entry point
│       │   ├── pages/         # UI pages
│       │   └── components/    # Reusable UI components
│       ├── core/              # Core functionality
│       │   ├── quantizer/     # Quantization engine
│       │   └── monitoring/    # GPU monitoring
│       └── services/          # Background services
├── docker/                    # Docker configuration
│   ├── Dockerfile.gpu
│   └── docker-compose.yml
├── .github/
│   └── workflows/             # CI/CD workflows
├── pixi.toml                  # Pixi configuration
└── README.md

Development

Available Commands

# Run development server
pixi run dev

# Lint code
pixi run lint

# Format code
pixi run fmt

# Type checking
pixi run typecheck

# Run tests
pixi run test

Adding Dependencies

Edit pixi.toml:

Add to [dependencies] for conda packages
Add to [pypi-dependencies] for PyPI packages

Then run:

pixi install

Usage

1. Configure Quantization

Navigate to the Configure page and set:

Model ID (e.g., meta-llama/Llama-3.1-8B)
Quantization method (AWQ, NVFP4, or GGUF)
Calibration dataset settings (required for AWQ and NVFP4)
Method-specific parameters:
- AWQ: Weight bits, group size, zero point
- NVFP4: Activation/weight schemes
- GGUF: Quantization type (Q4_K_M recommended, Q5_K_M for best quality), intermediate format (f16 default)

Note:

AWQ and NVFP4 output formats follow llmcompressor conventions (binary or safetensors)
GGUF produces .gguf files compatible with llama.cpp, Ollama, and other GGUF-compatible inference engines
GGUF quantization types:
- Q4_K_M: Recommended for balanced quality and size
- Q5_K_M: Best quality while maintaining reasonable size
- Q6_K, Q8_0: Higher precision options
- Q2_K, Q3_K: Smaller sizes with reduced quality

2. Monitor Progress

The Monitor page shows:

Job status and logs with streaming updates
Real-time GPU metrics with visual charts (utilization, memory, temperature, power)
GPU selector for multi-GPU systems
Cancel button to terminate running jobs

3. Access Results

The Results page lists:

Quantized model outputs
Cache information
Model sizes and paths

CI/CD

Pull Requests

On PR open/update:

Linting and type checking
Unit tests
Docker build (no push)

Main Branch

On merge to main:

Docker image build and push to GHCR
Tagged with latest and sha-<commit>

Docker Images

Images are published to GitHub Container Registry:

# Pull latest
docker pull ghcr.io/OWNER/msquant:latest

# Pull specific commit
docker pull ghcr.io/OWNER/msquant:sha-abc1234

Replace OWNER with your GitHub username/organization.

License

[Add your license here]

Credits

Built with:

NiceGUI - Web interface
llmcompressor - Quantization engine for AWQ/NVFP4
llama.cpp - GGUF quantization and inference
vLLM - LLM inference
Pixi - Package management

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.context		.context
.github/workflows		.github/workflows
docker		docker
src/msquant		src/msquant
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
NiceGUI_Advanced_Guide.md		NiceGUI_Advanced_Guide.md
NiceGUI_Core_Concepts.md		NiceGUI_Core_Concepts.md
README.md		README.md
pixi.toml		pixi.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MSQuant

Features

Requirements

Quick Start

Local Development (CPU only, UI testing)

Docker Deployment (with GPU)

Configuration

Project Structure

Development

Available Commands

Adding Dependencies

Usage

1. Configure Quantization

2. Monitor Progress

3. Access Results

CI/CD

Pull Requests

Main Branch

Docker Images

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

MissionSquad/msquant

Folders and files

Latest commit

History

Repository files navigation

MSQuant

Features

Requirements

Quick Start

Local Development (CPU only, UI testing)

Docker Deployment (with GPU)

Configuration

Project Structure

Development

Available Commands

Adding Dependencies

Usage

1. Configure Quantization

2. Monitor Progress

3. Access Results

CI/CD

Pull Requests

Main Branch

Docker Images

License

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages