TaguchiBench is a suite of tools designed for robust parameter optimization using the Taguchi method. It currently consists of two main components:
- TaguchiBench Engine: A generic, target-agnostic C# framework for designing and executing Taguchi experiments, analyzing results, and predicting optimal parameter configurations for any command-line executable or script.
- TaguchiBench LiveBench Runner: A specific C# command-line utility that acts as a target executable for the Engine. It facilitates running coding benchmarks using the LiveBench framework against OpenAI-compatible APIs, including locally managed
llama-server
instances.
This suite allows for systematic and efficient optimization of parameters, considering multiple performance metrics and interactions between parameters.
View an example markdown report here
The TaguchiBench Engine is built on the principle of agnosticism and genericity. It does not make assumptions about the target program being optimized, other than requiring it to accept parameters via command-line arguments/environment variables and to output results in a specific JSON format. This allows it to be a versatile tool for a wide range of optimization tasks.
The LiveBench Runner serves as a prime example of such a target executable, tailored for LLM performance evaluation in coding tasks.
- .NET SDK (.NET 8.0).
- Python 3.6+: Required if you intend to use
TaguchiBench.LiveBenchRunner
or run theinstall-livebench.sh
script. - Git: For cloning LiveBench via the install script.
If your primary goal is to optimize parameters for llama-server
using the LiveBench coding benchmarks (the original use case of this project), follow these steps:
-
Clone this Repository:
git clone https://github.com/kooshi/TaguchiBench.git cd TaguchiBench
-
Set up LiveBench: Run the provided script to download and prepare the LiveBench framework. This will clone the LiveBench repository into a
livebench
subdirectory.chmod +x install-livebench.sh ./install-livebench.sh
-
Build the TaguchiBench Suite: This command will build all projects (
Common
,Engine
,LiveBenchRunner
).dotnet build src/TaguchiBench.sln -c Release
-
Configure a Simple Experiment:
- A
simple-livebench-config.yaml
file is provided in the repository root. This file is pre-configured to useTaguchiBench.LiveBenchRunner
as the target for optimizing commonllama-server
sampler parameters. - You MUST edit
simple-livebench-config.yaml
to set the correct paths for:- The first argument under
fixedCommandLineArguments
: IftargetExecutablePath
is"dotnet"
, this argument must be the path to your compiledTaguchiBench.LiveBenchRunner.dll
(e.g.,src/TaguchiBench.LiveBenchRunner/bin/Release/netX.X/TaguchiBench.LiveBenchRunner.dll
). AdjustnetX.X
(e.g.,net6.0
,net8.0
) to your .NET version. IftargetExecutablePath
points directly to an.exe
, this DLL argument is not needed. --livebench-scripts-path
(underfixedCommandLineArguments
): Point this to thelivebench
directory created byinstall-livebench.sh
(e.g.,./livebench/livebench
).--llama-server-exe
(underfixedCommandLineArguments
): Path to your compiledllama-server
executable.--llama-model
(underfixedCommandLineArguments
): Path to your GGUF model file.
- The first argument under
- Review other settings in
simple-livebench-config.yaml
like--lb-num-questions
if you want a faster initial test.
- A
-
Run Your First Experiment: Once
simple-livebench-config.yaml
is updated, run the Engine from the repository root:chmod +x run-livebench-experiment.sh ./run-livebench-experiment.sh --config simple-livebench-config.yaml
This script will use
dotnet run
(or execute a compiled version) to start the Engine with your specified configuration. Results will appear in theoutputDirectory
defined in the config file (e.g.,./livebench_optimization_results/
).
The heart of the suite. It handles:
- Experimental design using Taguchi Orthogonal Arrays.
- Execution of a user-defined target program with varying parameters.
- Collection of multiple numerical metrics from the target.
- Advanced statistical analysis: S/N ratios, ANOVA (with pooling), main effects, interactions.
- Prediction of optimal parameter settings and performance with confidence intervals for each analyzed metric.
- Comprehensive HTML and Markdown reporting.
- Experiment state persistence and recovery for long-running tasks.
➡️ Go to TaguchiBench Engine README for detailed usage and configuration.
A specialized utility that:
- Runs LiveBench coding benchmarks.
- Can target any OpenAI-compatible API endpoint.
- Optionally manages a local
llama-server
instance for evaluations. - Outputs results in the JSON format expected by the TaguchiBench Engine.
- Can also be used as a standalone tool for single LiveBench runs.
➡️ Go to TaguchiBench LiveBench Runner README for detailed usage and CLI options.
- Set up your Environment:
- Install .NET SDK.
- Prepare your Target Executable:
- Ensure it adheres to the Engine's input/output contract.
- Configure the TaguchiBench Engine:
- Create a
config.yaml
file for the Engine (seesrc/TaguchiBench.Engine/sample-config.yaml
). - Specify your
targetExecutablePath
. - Define the
metricsToAnalyze
,controlFactors
,fixedCommandLineArguments
, etc., relevant to your target.
- Create a
- Run the Engine:
# Example using dotnet run for the Engine dotnet run --project TaguchiBench.Engine -- --config your_experiment_config.yaml
- Review Results:
- Check the
outputDirectory
for detailed HTML and Markdown reports, and the experiment state YAML file.
- Check the
- More sophisticated state management for recovery (e.g., handling Ctrl-C interruption).
- Advanced ANOVA pooling strategies (e.g., based on F-distribution rather than just percentage threshold if error DF is low).
- GUI for configuration and result visualization.
This project is licensed under the MIT License - see the LICENSE
file for details.
- NightHawkInLight (YouTube) for the inspiration.
- llama.cpp for the LLM inference engine.
- LiveBench for the coding benchmark framework.
- YamlDotNet for YAML serialization in C#.
- Serilog for flexible logging.
- Chart.js for chart rendering in HTML reports.
- MathNet.Numerics for statistical distributions.