CC CLI

A simple command-line tool for running CC AI models locally with performance optimizations.

What is CC CLI?

CC CLI is a user-friendly command-line interface that makes it easy to install, manage, and run CC AI models on your local machine. It's built on top of Ollama and provides simplified commands for common tasks.

Features

Simple Installation: One command to install everything
Easy Model Management: Download and switch between different CC models
Intuitive Commands: Simple, memorable commands for running AI models
Cross-Platform: Works on macOS and Linux
Configurable: Set default models and preferences
Performance Optimizations: Hardware-specific optimizations for faster inference
Cloud Integration:
- Authenticate with GCP, AWS, and Azure
- Find the most cost-effective cloud instances for running models
- Provision and manage cloud instances for resource-intensive models
- Intelligent hardware requirements calculation based on model parameters

Quick Installation

# Clone this repository
git clone https://github.com/Noahcasarotto/cc-cli.git
cd cc-cli

# Run the installer
bash install.sh

Or install with a single command:

curl -fsSL https://raw.githubusercontent.com/Noahcasarotto/cc-cli/main/install.sh | bash

Usage

Install Ollama and Download CC Model

cc install

Run CC

# Run with interactive prompt
cc run

# Run with a specific prompt
cc run "What is quantum computing?"

# Run a specific model
cc run cc-r1:14b "Explain the theory of relativity"

Manage Models

# List available models
cc list

# Download a specific model
cc pull cc-r1:8b

# Set default model
cc set-default cc-r1:14b

Configuration

# Enable verbose mode
cc verbose on

# Disable verbose mode
cc verbose off

Performance Optimizations

CC CLI includes several tools to optimize the performance of AI models on your local hardware:

analyze_hardware.sh: Analyzes your system and recommends suitable models
optimize_llm.sh: Applies comprehensive optimizations for LLM inference
test_models.sh: Tests and compares the performance of different models

These tools work together to provide a seamless and efficient experience when running CC models.

Quick Start

# Analyze your hardware
./analyze_hardware.sh

# Apply optimizations
./optimize_llm.sh

# Test model performance
./test_models.sh

# Use optimized CLI
cc-optimized run "Your prompt here"

# Enable high-performance mode
llm-performance-mode

Optimization Details

The optimize_llm.sh script applies the following optimizations:

Metal GPU Optimizations:
- Disables debug layers for production performance
- Configures optimal buffer sizes
- Sets Apple Silicon specific parameters
Memory Management:
- Reduces memory fragmentation
- Optimizes memory allocation
- Configures memory mapping thresholds
Thread Optimizations:
- Sets optimal thread counts for your CPU
- Balances workloads across cores
- Configures numerical libraries for parallelism
System-Level Improvements:
- Performance-oriented environment variables
- I/O optimizations
- Library-specific settings
Automatic Loading:
- Creates LaunchAgent for automatic loading at login
- Adds environment settings to shell profile
- Provides convenient wrapper scripts

Hardware Analysis

The analyze_hardware.sh script analyzes your system hardware and provides model recommendations based on:

CPU type and core count
Available RAM
Disk space
GPU capabilities

Usage

./analyze_hardware.sh

Output Example

Analyzing system hardware...
----------------------------------------
CPU Information:
Model: Apple M3
Cores: 8

Memory Information:
Total Memory: 8.00GB

Disk Information:
Available Space: 13Gi

GPU Information:
GPU: Chipset Model: Apple M3

Model Recommendation:
Recommended: Use medium-sized models (deepseek-r1:8b)

Performance Optimization Suggestions:
- GPU detected: Consider enabling GPU acceleration
----------------------------------------

LLM Optimizations

The optimize_llm.sh script applies comprehensive optimizations for AI model inference:

Features

Metal GPU Optimizations for Apple Silicon
- Disables debug features for production performance
- Configures optimal buffer sizes
- Sets Apple Silicon specific parameters
Memory Management Optimizations
- Reduces memory fragmentation
- Optimizes memory allocation
- Configures memory mapping thresholds
Thread Optimizations
- Sets optimal thread counts for your CPU
- Configures numerical libraries for efficient parallelism
System-Level Improvements
- Performance-oriented environment variables
- Library-specific settings
Automatic Loading
- Creates LaunchAgent for automatic loading at login
- Adds environment settings to shell profile

Installation

./optimize_llm.sh

What Gets Installed

Environment Files: Created in ~/.llm-optimizations/
Wrapper Scripts:
- cc-optimized: Wrapper for running CC with optimizations
- llm-performance-mode: Script for enabling high-performance mode
LaunchAgent: For automatic loading at login
Shell Configuration: Updates to ~/.zshrc

Using Optimized CLI

After running the optimization script, you can use:

# Run with optimized settings
cc-optimized run "Your prompt here"

# Run with a specific model
cc-optimized run phi "Calculate the square root of 144"

Performance Mode

The llm-performance-mode script optimizes system resources for LLM operations:

Features

Flushes disk cache for better I/O performance
Provides performance recommendations
Optimizes system resources

Usage

llm-performance-mode

This script requires sudo access to flush the disk cache.

Testing Model Performance

The test_models.sh script allows you to test and compare the performance of different models:

Features

Tests multiple models with the same prompt
Measures response time
Captures model outputs
Provides performance comparisons

Usage

./test_models.sh

Output Example

Starting model phi...
Paris.

total duration:       5.46s
load duration:        68.53ms
prompt eval count:    44 token(s)
prompt eval duration: 2.44s
prompt eval rate:     17.99 tokens/s
eval count:           4 token(s)
eval duration:        2.94s
eval rate:            1.36 tokens/s

Available CC Models

Model	Size	Description	Recommended RAM
cc-r1:1.5b	~1.1GB	Lightweight model for basic tasks	8GB+
cc-r1:8b	~4.9GB	Good balance of capability and resource usage	16GB+
cc-r1:14b	~8.5GB	Better performance but requires more resources	32GB+
cc-r1:32b	~19GB	Advanced capabilities but requires significant resources	32-64GB+
cc-r1:70b	~40GB	Most capable but requires powerful hardware	64GB+

Performance Tips

Smaller models (1.5B, 8B) run faster but have more limited capabilities
Larger models (14B, 32B, 70B) offer better reasoning but require more RAM and processing power
First-time model loading is slower as the model gets optimized for your hardware
Subsequent runs are faster as the model remains cached
Use llm-performance-mode before running intensive tasks
For Apple Silicon, Metal optimizations provide significant speed improvements
Match thread count to your available CPU cores for optimal performance

Cloud Integration

CC CLI provides built-in functionality to authenticate and work with major cloud providers, allowing you to leverage cloud resources for running larger models or managing cloud infrastructure.

Installing Cloud Dependencies

Before using the cloud integration features, you need to install the necessary cloud provider CLIs. A helper script is provided to streamline this process:

./install_cloud_deps.sh

This script will guide you through installing:

Google Cloud SDK (gcloud)
AWS CLI
Azure CLI
jq (required for JSON processing)

You can also install individual CLIs:

./install_cloud_deps.sh --gcp   # Install only Google Cloud SDK
./install_cloud_deps.sh --aws   # Install only AWS CLI
./install_cloud_deps.sh --azure # Install only Azure CLI
./install_cloud_deps.sh --all   # Install all without prompting

Authenticating with Cloud Providers

The cc-login.sh script allows you to authenticate with all supported cloud providers through a simple interface:

./cc-login.sh          # Interactive login menu
./cc-login.sh --all    # Login to all configured cloud providers
./cc-login.sh --gcp    # Login to GCP only
./cc-login.sh --aws    # Login to AWS only (supports SSO and access keys)
./cc-login.sh --azure  # Login to Azure only
./cc-login.sh --status # Check authentication status
./cc-login.sh --logout # Logout from all providers

Authentication Features

The cloud authentication system provides several features:

Multiple Authentication Methods:
- AWS: Support for both SSO and traditional access keys
- Azure: Web-based authentication with subscription selection
- GCP: Interactive project selection and management
Status Tracking: The system keeps track of which providers you're authenticated with
Configuration Management: Cloud provider configurations are stored in ~/.cc-cli/config
Intelligent Detection: Automatically detects if you're already logged in
Guided Setup: Walks you through project/subscription selection where applicable

Finding Optimal Cloud Instances

The cloud_compute.sh script helps you find the most cost-effective cloud instances for running your models. It uses an advanced model requirements calculation system that determines optimal hardware configurations based on model parameters, quantization, and intended use case.

./cloud_compute.sh find-cheapest <model> [performance-level]

Where <model> is one of the supported models (cc-r1:1.5b, cc-r1:8b, llama3:8b, etc.) and the optional [performance-level] can be:

basic: Lowest cost configuration, CPU-only for smaller models
standard: Balanced cost/performance (default)
optimal: Best performance, higher-tier GPUs with more resources

Example Usage

# Find cheapest instance for running cc-r1:8b with standard performance
./cloud_compute.sh find-cheapest cc-r1:8b

# Find the optimal (highest performance) instance for llama3:8b
./cloud_compute.sh find-cheapest llama3:8b optimal

# Find a basic (lowest cost) instance for phi
./cloud_compute.sh find-cheapest phi basic

Advanced Model Requirements Calculation

The cloud_compute.sh script uses a sophisticated approach to determine hardware requirements:

Model Parameter Analysis: Calculates resource needs based on model size, context length, and architecture
Quantization Awareness: Adjusts memory requirements based on quantization method (none, int8, int4)
Performance Tier Scaling: Scales requirements based on desired performance level
GPU Selection: Intelligently selects appropriate GPU types based on memory needs
Provider-Specific Optimization: Considers differences between GCP and Azure instance types

Supported Models

The system supports a wide range of models with automatically calculated requirements:

CC models: cc-r1:1.5b, cc-r1:8b, cc-r1:14b, cc-r1:32b, cc-r1:70b
Third-party models: phi, mistral, gemma:2b, llama3:8b, qwen:4b

Cloud Provisioning

The cloud_compute.sh script also provides functionality to provision cloud instances based on model requirements:

./cloud_compute.sh provision <model> [performance-level]

This command:

Finds the most cost-effective instance across providers
Provisions the instance with the necessary configuration
Sets up the environment with Docker and CC CLI
Provides connection instructions

Instance Management

After provisioning, you can:

Connect to your instance:

# For GCP (example)
gcloud compute ssh <instance-name> --zone=<zone>

Run models on the provisioned instance:

cc run --cloud=gcp --machine=<machine-type> <model> "Your prompt"

Terminate the instance when done:

# For GCP (example)
gcloud compute instances delete <instance-name> --zone=<zone>

Workflow Example

Install cloud dependencies and authenticate:

./install_cloud_deps.sh --gcp
./cc-login.sh --gcp

Find the optimal instance for your model:

./cloud_compute.sh find-cheapest cc-r1:70b

Provision an instance:
```
./cloud_compute.sh provision cc-r1:70b
```
Connect and use the model
Terminate when done

Technical Details

General Implementation

CC CLI is a bash script wrapper around Ollama, making it easier to use CC models specifically. It handles:

Installation of Ollama
Model download and management
Configuration preferences
Simple command-line interface
Hardware-specific optimizations

Metal Optimizations

The following Metal environment variables are configured:

export MTL_CAPTURE_ENABLED=0
export MTL_DEBUG_LAYER=0
export MTL_MAX_BUFFER_LENGTH=1073741824
export MTL_GPU_FAMILY=apple7
export MTL_GPU_VERSION=1

Memory Management

Memory allocation is optimized with:

export MALLOC_ARENA_MAX=2
export MALLOC_MMAP_THRESHOLD_=131072
export MALLOC_TRIM_THRESHOLD_=131072
export MALLOC_MMAP_MAX_=65536

Thread Configuration

Thread counts are optimized for your CPU:

export OMP_NUM_THREADS=$CPU_CORES
export MKL_NUM_THREADS=$CPU_CORES
export VECLIB_MAXIMUM_THREADS=$CPU_CORES
export NUMEXPR_NUM_THREADS=$CPU_CORES
export OPENBLAS_NUM_THREADS=$CPU_CORES

Performance Settings

Additional performance settings include:

export PYTHONUNBUFFERED=1
export TF_CPP_MIN_LOG_LEVEL=2
export TF_ENABLE_ONEDNN_OPTS=1
export HDF5_USE_FILE_LOCKING=FALSE

Troubleshooting

Common Issues

LaunchAgent Loading Error

If you see Load failed: 5: Input/output error when running optimize_llm.sh, you can try:
```
sudo launchctl bootstrap system ~/Library/LaunchAgents/com.llm.optimizations.plist
```
Permission Issues

If you encounter permission issues with llm-performance-mode, ensure the script is executable:
```
chmod +x $HOME/bin/llm-performance-mode
```
Command Not Found

If cc-optimized or llm-performance-mode commands are not found, ensure $HOME/bin is in your PATH:
```
echo 'export PATH="$HOME/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc
```

Advanced Configurations

Customizing Optimizations

You can customize the optimization settings by editing the configuration files in ~/.llm-optimizations/:

# Edit Metal configuration
nano ~/.llm-optimizations/metal.conf

# Edit memory configuration
nano ~/.llm-optimizations/memory.conf

# Edit main environment file
nano ~/.llm-optimizations/environment

Model-Specific Optimizations

For optimal performance with specific models:

Small Models (1.5B, 2B)
- Ideal for quick responses and basic tasks
- Works well with minimal optimizations
Medium Models (8B)
- Benefits from thread optimizations
- Good balance of performance and capability
Large Models (14B+)
- Requires full set of optimizations
- Benefits significantly from GPU acceleration
- Use llm-performance-mode for best results

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Credits

CC AI for their excellent models
Ollama for making local LLM inference possible

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
202502OpenSourceWeek		202502OpenSourceWeek
cc		cc
LICENSE		LICENSE
README.md		README.md
analyze_hardware.sh		analyze_hardware.sh
cc-login.sh		cc-login.sh
cc.sh		cc.sh
cloud_compute.sh		cloud_compute.sh
cloud_test.sh		cloud_test.sh
install.sh		install.sh
install_cloud_deps.sh		install_cloud_deps.sh
optimize_llm.sh		optimize_llm.sh
test_models.sh		test_models.sh

License

Noahcasarotto/cc-cli

Folders and files

Latest commit

History

Repository files navigation

CC CLI

Table of Contents

What is CC CLI?

Features

Quick Installation

Usage

Install Ollama and Download CC Model

Run CC

Manage Models

Configuration

Performance Optimizations

Quick Start

Optimization Details

Hardware Analysis

Usage

Output Example

LLM Optimizations

Features

Installation

What Gets Installed

Using Optimized CLI

Performance Mode

Features

Usage

Testing Model Performance

Features

Usage

Output Example

Available CC Models

Performance Tips

Cloud Integration

Installing Cloud Dependencies

Authenticating with Cloud Providers

Authentication Features

Finding Optimal Cloud Instances

Example Usage

Advanced Model Requirements Calculation

Supported Models

Cloud Provisioning

Instance Management

Workflow Example

Technical Details

General Implementation

Metal Optimizations

Memory Management

Thread Configuration

Performance Settings

Troubleshooting

Common Issues

Advanced Configurations

Customizing Optimizations

Model-Specific Optimizations

License

Contributing

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages