Easy Edge

A simple Ollama-like tool for running Large Language Models (LLMs) locally using llama.cpp under the hood.

Features

🚀 Local LLM Inference: Run models locally using llama.cpp
📥 Automatic Downloads: Download models from URLs or Hugging Face
💬 Interactive Chat: Chat with models in an interactive terminal
📋 Model Management: List, download, and remove models
⚙️ Configurable: Customize model parameters and settings

Installation

Install Easy Edge from PyPI:

pip install easy-edge

Or, to install the latest version from source:

git clone https://github.com/criminact/easy-edge.git
cd easy-edge
pip install .

Usage

After installation, use the easy-edge command from your terminal:

Download a Model

easy-edge pull --repo-id TheBloke/Llama-2-7B-Chat-GGUF --filename llama-2-7b-chat.Q4_K_M.gguf

Or download from a Hugging Face URL:

easy-edge pull --url https://huggingface.co/google/gemma-3-1b-it-qat-q4_0-gguf/resolve/main/gemma-3-1b-it-q4_0.gguf

Run the Model

Single prompt:

easy-edge run gemma-3-1b-it-qat-q4_0-gguf --prompt "Hello, how are you?"

Interactive chat:

easy-edge run gemma-3-1b-it-qat-q4_0-gguf --interactive

List Installed Models

easy-edge list

Remove a Model

easy-edge remove gemma-3-1b-it-qat-q4_0-gguf

Configuration

The tool stores configuration in models/config.json. You can modify settings like:

max_tokens: Maximum tokens to generate (default: 2048)
temperature: Sampling temperature (default: 0.7)
top_p: Top-p sampling parameter (default: 0.9)

Requirements

Python 3.11+
8GB+ RAM (for 7B models)
16GB+ RAM (for 13B models)
4GB+ free disk space per model

Troubleshooting

Common Issues

"llama-cpp-python not installed"
```
pip install llama-cpp-python
```
Out of memory errors
- Try smaller models (7B instead of 13B)
- Use more quantized models (Q4_K_M instead of Q8_0)
- Close other applications to free up RAM
Slow inference
- The tool uses all CPU cores by default
- For better performance, consider using GPU acceleration (requires CUDA)

GPU Acceleration (Optional)

For faster inference with NVIDIA GPUs:

pip uninstall llama-cpp-python
pip install llama-cpp-python --force-reinstall --index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu118

Finetuning Your Own Model

Easy Edge supports finetuning LLMs using a Modelfile (Ollama-style) and Hugging Face Trainer. This allows you to create custom models for your own data and use them locally.

1. Create a Modelfile

A Modelfile describes the base model, training parameters, and example messages for finetuning. Example:

HF_TOKEN <your_huggingface_token>
FROM meta-llama/Llama-3.2-1B-Instruct

PARAMETER device cpu
PARAMETER max_length 64
PARAMETER learning_rate 3e-5
PARAMETER epochs 4
PARAMETER batch_size 1
PARAMETER lora true
PARAMETER lora_r 8
PARAMETER lora_alpha 32
PARAMETER lora_dropout 0.05
PARAMETER lora_target_modules q_proj,v_proj

SYSTEM You are a helpful assistant.
MESSAGE user How can I reset my password?
MESSAGE assistant To reset your password, click on 'Forgot Password' at the login screen and follow the instructions.

HF_TOKEN is your Hugging Face access token (required for private models).
FROM specifies the base model to finetune.
PARAMETER lines set training options (see above for examples).
SYSTEM and MESSAGE blocks provide training data.

2. Run Finetuning

Use the finetune command to start training:

easy-edge finetune --modelfile Modelfile --output my-finetuned-model --epochs 4 --batch-size 1 --learning-rate 3e-5

--modelfile is the path to your Modelfile.
--output is where the trained model will be saved.
You can override epochs, batch size, and learning rate on the command line.

3. Convert to GGUF (for llama.cpp)

After training, you will see instructions to convert your model to GGUF format for use with llama.cpp:

python3 convert_hf_to_gguf.py --in my-finetuned-model --out my-finetuned-model.gguf

Upload your GGUF file to Hugging Face or use it locally with Easy Edge.

Notes

Finetuning is resource-intensive. For best results, use a machine with a GPU.
LoRA/PEFT is supported for efficient finetuning.
See the example Modelfile in the repository for more options.

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

MIT License - see LICENSE file for details.

Acknowledgments

llama.cpp - The underlying inference engine
Ollama - Inspiration for the tool design
Hugging Face - Model hosting and distribution

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
models		models
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Modelfile		Modelfile
README.md		README.md
easy_edge.py		easy_edge.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Easy Edge

Features

Installation

Usage

Download a Model

Run the Model

List Installed Models

Remove a Model

Configuration

Requirements

Troubleshooting

Common Issues

GPU Acceleration (Optional)

Finetuning Your Own Model

1. Create a Modelfile

2. Run Finetuning

3. Convert to GGUF (for llama.cpp)

Notes

Contributing

License

Acknowledgments

About

Uh oh!

Releases 3

Packages

Languages

criminact/easy-edge

Folders and files

Latest commit

History

Repository files navigation

Easy Edge

Features

Installation

Usage

Download a Model

Run the Model

List Installed Models

Remove a Model

Configuration

Requirements

Troubleshooting

Common Issues

GPU Acceleration (Optional)

Finetuning Your Own Model

1. Create a Modelfile

2. Run Finetuning

3. Convert to GGUF (for llama.cpp)

Notes

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages