No magic scores. No-BullShit.
Compare M1 Air, M4 Max, RTX 3060, RTX 4060, A100 and other hardware on different AI tasks
NoBS is an open-source benchmark suite for evaluating real AI hardware performance β not synthetic FLOPS or polished demos.
It's a collection of reproducible tests and community-submitted results for:
- π§© Embeddings β β Ready (sentence-transformers, IMDB dataset)
- π¬ LLM inference β π§ In Progress (LM Studio awailable, Awesome Prompts dataset)
- ποΈ VLM inference β π Planned
- π¨ Diffusion image generation β π Planned
- π¬ Classic ML β π Planned (scikit-learn, XGBoost, LightGBM, Catboost)
"We donβt measure synthetic FLOPS. We measure how your GPU cries in real life."
NoBS was built to understand how different devices β from everyday laptops and PCs to large inference giants β actually perform on real AI tasks.
Last Updated: 2025-10-20
| Rank | Device | Platform | CPU | RAM | GPU | VRAM | Embeddings | LLM | Total Score |
|---|---|---|---|---|---|---|---|---|---|
| π₯ 1 | Mac16,6 | π macOS | Apple M4 Max (14) | 36 GB | Apple M4 Max (32 cores) | shared with system RAM | 637.17 | 157.84 | 795.01 |
| π₯ 2 | ASUSTeK COMPUTER INC. ASUS Vivobook Pro 15 N6506MV_N6506MV 1.0 | π§ Linux | Intel(R) Core(TM) Ultra 9 185H (16) | 23 GB | NVIDIA GeForce RTX 4060 Laptop GPU | 8 GB | 539.73 | 26.42 | 566.15 |
β« Apple (1 device)
| Rank | Device | Platform | CPU | RAM | GPU | VRAM | Embeddings | LLM | Total Score |
|---|---|---|---|---|---|---|---|---|---|
| π₯ 1 | Mac16,6 | π macOS | Apple M4 Max (14) | 36 GB | Apple M4 Max (32 cores) | shared with system RAM | 637.17 | 157.84 | 795.01 |
π’ NVIDIA (1 device)
| Rank | Device | Platform | CPU | RAM | GPU | VRAM | Embeddings | LLM | Total Score |
|---|---|---|---|---|---|---|---|---|---|
| π₯ 1 | ASUSTeK COMPUTER INC. ASUS Vivobook Pro 15 N6506MV_N6506MV 1.0 | π§ Linux | Intel(R) Core(TM) Ultra 9 185H (16) | 23 GB | NVIDIA GeForce RTX 4060 Laptop GPU | 8 GB | 539.73 | 26.42 | 566.15 |
| Device | Model | Rows/sec | Time (s) | Embedding Dim | Batch Size |
|---|---|---|---|---|---|
| ASUSTeK COMPUTER INC. ASUS Vivobook Pro 15 N6506MV_N6506MV 1.0 | nomic-ai/modernbert-embed-base | 36.24 | 2.76 | 768 | 16 |
| ASUSTeK COMPUTER INC. ASUS Vivobook Pro 15 N6506MV_N6506MV 1.0 | thenlper/gte-large | 25.57 | 3.91 | 1024 | 16 |
| Mac16,6 | nomic-ai/modernbert-embed-base | 36.30 | 2.76 | 768 | 16 |
| Mac16,6 | thenlper/gte-large | 34.55 | 2.89 | 1024 | 16 |
| Device | Model | Tokens/sec | TTFT (s) | Latency (s) | Input Tokens | Output Tokens |
|---|---|---|---|---|---|---|
| ASUSTeK COMPUTER INC. ASUS Vivobook Pro 15 N6506MV_N6506MV 1.0 | gpt-oss-20b | 16.70 | 27.87 | 136.28 | 561 | 3443 |
| Mac16,6 | gpt-oss-20b | 168.14 | 6.50 | 22.81 | 561 | 4280 |
All metrics are median values across 3 runs.
Scores calculated as: num_tasks * 3600 / total_time_seconds.
- Python 3.12+
- uv package manager
# Clone the repository
git clone https://github.com/bogdanminko/nobs.git
cd nobs
# Install dependencies
uv syncuv run python main.pyThis will:
- Auto-detect your hardware (CUDA/MPS/CPU)
- Run all available benchmarks (currently: embeddings)
- Save results to
results/report_{your_device}.json
# Embeddings only
uv run python -m src.tasks.text_embeddings.runner
# LLM inference (requires LM Studio running on localhost:1234)
uv run python -m src.tasks.llms.runnerNote: LLM benchmarks currently require LM Studio running locally.
- Download and install LM Studio
- Load a model in LM Studio
- Start the local server (default:
http://localhost:1234) - Run the LLM benchmark:
uv run python -m src.tasks.llms.runner
We welcome contributions! Whether it's adding new benchmarks, supporting new models, or submitting your hardware results.
-
Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/nobs.git cd nobs -
Install dependencies including dev tools
uv sync --group quality --group dev
-
Install pre-commit hooks
pre-commit install
This sets up automatic code quality checks that run before each commit:
- ruff β Fast Python linter and formatter
- mypy β Static type checking
- bandit β Security vulnerability scanner
- Standard checks (trailing whitespace, YAML syntax, etc.)
-
Create a new branch
git checkout -b feature/your-feature-name
-
Make your changes
- Write code following the existing patterns
- Add type hints where applicable
- Update documentation if needed
-
Test your changes
# Run benchmarks to ensure they work uv run python main.py # Update benchmark results tables (if you modified results) make # Run code quality checks manually (optional - pre-commit will run them automatically) make format
Available Makefile commands:
makeβ Generate benchmark results tables (default)make generateβ Generate benchmark results tablesmake formatβ Run pre-commit hooks on all filesmake lintβ Run ruff linter onlymake cleanβ Clean Python cache filesmake helpβ Show all available commands
-
Commit your changes
git add . git commit -m "feat: your descriptive commit message"
Pre-commit hooks will automatically:
- Format your code
- Check for type errors
- Scan for security issues
- Fix common issues (trailing whitespace, etc.)
If any check fails, fix the issues and commit again.
-
Push and create a Pull Request
git push origin feature/your-feature-name
All contributions must pass:
- β Ruff linting and formatting
- β Mypy type checking
- β Bandit security checks
These are enforced automatically via pre-commit hooks.
See CLAUDE.md for detailed instructions on:
- Adding new models to existing benchmarks
- Creating new benchmark categories
- Data loading patterns
- Memory management best practices