curious-qmoe is a curiosity-driven quantized Mixture-of-Experts framework for efficient audio classification on resource-constrained edge devices. curious-qmoe achieves 99.9% of full-precision accuracy with 4Γ compression and 82% latency variance reduction through Bayesian epistemic uncertainty-based routing.
Key Features:
- Heterogeneous Quantization: BitNet ternary, BitLinear (1-16 bit), post-training quantization (PTQ) with bitwise operations
- Curiosity-Driven Routing: Bayesian router with Monte Carlo dropout for epistemic uncertainty estimation
- Mixture-of-Experts: Dynamic expert selection across quantized models for adaptive precision
- Hardware-Efficient: Optimized for edge deployment with predictable latency (29 ms std)
- Comprehensive Evaluation: Energy consumption, carbon emissions, and statistical significance testing
- Reproducible: Hydra configuration management, cross-validation, experiment tracking
Datasets: ESC-50, Quinn, UrbanSound8K
conda create -n curious-qmoe python=3.11 -y
conda activate curious-qmoe
git clone https://github.com/sebasmos/QWave.git
cd QWave
pip install -e .cd scripts
python benchmark.py \
--config-path /path/to/curious-qmoe/config \
--config-name esc50 \
experiment.datasets.esc.csv=/path/to/esc-50.csv \
experiment.device=cpu \
experiment.models_to_run=[esc]python benchmark.py \
--config-path /path/to/curious-qmoe/config \
--config-name esc50 \
experiment.device=cpu \
experiment.datasets.esc.csv=/path/to/esc-50.csv \
experiment.models_to_run=[moe] \
experiment.router.expert_quantizations="[bitnet,'1','2','4','8','16',qesc]" \
experiment.router.num_experts=3 \
experiment.router.top_k=1 \
experiment.router.use_curiosity=true \
experiment.metadata.tag=esc_moe_curiosityCuriosity outputs (saved per fold):
curiosity_values.json- Raw uncertainty valuescuriosity_histogram.png- Distribution of epistemic uncertaintycuriosity_per_class.png- Average uncertainty per class
curious-qmoe/
βββ config/ # Hydra configs
β βββ esc50.yaml # ESC-50 configuration
βββ curious_qmoe/ # Core source code
β βββ datasets.py # EmbeddingDataset and normalization
β βββ models.py # Neural architectures (MLP, ESCModel)
β βββ bitnnet.py # BitNet quantized layers
β βββ qmoe_layers.py # Quantized MoE layers
β βββ moe.py # MoE training and Bayesian Router
β βββ train_utils.py # Training/validation utilities
β βββ memory.py # Model size calculation
β βββ graphics.py # Plotting (ROC, losses, curiosity)
β βββ utils.py # Helpers (seeding, device, metrics)
βββ scripts/
β βββ benchmark.py # Main benchmarking pipeline
β βββ tables/ # Results analysis scripts
β βββ organize-results.py # Combine CSV results
β βββ analyze-std.py # Generate tables with meanΒ±std
β βββ analyze-significance.py # Statistical testing (t-tests, Levene)
β βββ README-significance.md # Model nomenclature reference
βββ outputs/ # Auto-generated results
βββ pyproject.toml
After running experiments, analyze results with the scripts in scripts/tables/:
Combine CSV files from multiple experiments:
cd scripts/tables
python organize-results.py # Edit dataset path in scriptCreate 5 tables with meanΒ±std from 5-fold cross-validation:
python analyze-std.pyOutput: tables-std/ folder with 4 main tables + 1 supplementary
Run paired t-tests and variance tests:
python analyze-significance.pyOutput: significance-tests/ folder with 6 CSV files:
- F1-score comparisons (Tables 1-3)
- Latency speedup tests (Table 4)
- Energy efficiency tests (Table 3)
- Variance reduction analysis (Levene's test)
Model nomenclature: See scripts/tables/README-significance.md for standardized names (FP32-Base, Q8-Base-PTQ, etc.)
Key parameters in config/esc50.yaml:
experiment:
models_to_run: [esc] # Options: esc, bitnet, moe, qmoe, 1, 2, 4, 8, 16, qesc
device: "cpu" # or "cuda", "mps"
datasets:
esc:
csv: "/path/to/esc-50.csv"
normalization_type: "standard"
model:
batch_size: 64
hidden_sizes: [640, 320]
learning_rate: 0.0005793146438537801
epochs: 10
router: # For MoE models
expert_quantizations: [1, 2, 4, 16]
num_experts: 4
top_k: 1
use_curiosity: false # Enable Bayesian Router
load_balancing: true
cross_validation:
n_splits: 5
shuffle: true
random_seed: 42Supported schemes:
- 1-bit to 16-bit: Symmetric quantization with scale factors
- BitNet: Ternary weights {-1, 0, 1} with per-channel scaling
- qesc: Bitwise popcount with 2-bit ternary encoding
This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).
@software{Cajas2025_curious_qmoe,
author = {Cajas OrdΓ³Γ±ez, SebastiΓ‘n AndrΓ©s and Torres, Luis and Meno, Mackenzie and Lai, Yuan and DurΓ‘n, Carlos and Celi, Leo Anthony},
title = {curious-qmoe: Learning to Route Curiously in Low-Bit Mixture-of-Experts},
year = {2025},
url = {https://github.com/sebasmos/QWave},
license = {CC-BY-NC-SA-4.0}
}