KB-AnswerScorer

A tool for evaluating LLM responses against a knowledge base of expert solutions.

Overview

KB-AnswerScorer is a utility designed to evaluate how well large language models (LLMs) answer customer support questions compared to established expert solutions. The tool processes questions, obtains model responses via OpenWebUI, compares these responses to reference solutions using semantic and lexical similarity metrics, and generates comprehensive reports.

The comparison is based on:

BERTScore (semantic similarity)
F1 Score (lexical similarity)
BLEU Score (translation quality metric)

Project Structure

KB-AnswerScorer/
├── main.py                  # Main script
├── Dockerfile               # Dockerfile
├── docker-compose.yml       # Docker compose
├── .env                     # Environment variables
├── data/                    # Input data directory
│   ├── samples              # Sample input files
│   ├── questions.xlsx       # Customer questions
│   └── solutions.xlsx       # Expert solutions
├── metrics/                 # Scoring modules
│   └──  metric_evaluator.py # Combined scoring metrics and matcher
├── opwebui/                 # OpenWebUI integration
│   └── api_client.py        # Client for OpenWebUI API
└── utils/                   # Utility modules
    ├── data_extractor.py    # Parses Excel input files
    ├── query_enhancer.py    # Enhances pre-process and pro-process prompt
    └── evaluation_utils.py  # Quality Assesment and reporting

Installation (With Docker)

To build and run the container for the first time, you need to make sure you've already configure the .env and the input Excel files, refer to the Configuration section for more details:

docker-compose build

To run the container:

docker-compose run --rm kb-scorer

If you make changes to the code, simply run update-docker.ps1 PowerShell Script to update your container:

.\update-docker.ps1

Installation (Without Docker)

Clone this repository:

git clone https://github.com/farithadnan/KB-AnswerScorer.git
cd KB-AnswerScorer

Create virtual environment:

python -m venv venv

Activate the virtual environment:

# cmd
path\to\venv\Scripts\activate

# powershell
.\path\to\venv\Scripts\Activate

# bash
source path/to/venv/bin/activate

Configuration

Create a .env file in the project root with the following variables:

# Directory and file locations
DATA_DIR_PATH=./data
QUESTION_EXCEL=questions.xlsx
SOLUTION_EXCEL=solutions.xlsx
QUESTION_SHEET_NAME=Questions

# OpenWebUI Configuration
OPENWEBUI_API_URL=http://your-openwebui-instance:5000/api/chat
OPENWEBUI_JWT_TOKEN=your_jwt_token_here

Getting the OpenWebUI JWT Token

Check the official guide for more details about this.

Login to your OpenWebUI instance
Open the browser developer tools (F12)
Go to the Application tab
Look for "Local Storage" and find the entry for your OpenWebUI domain
Copy the JWT token value (it starts with "ey...")

Input Files

The tool expects two Excel files:

Question file: Contains customer queries/issues.
- Required structure:
  - Excel file with customer questions (header starts at row 3)
  - Main question text in column B
  - Solutions used in column C (optional)
  - AI Solutions used in column D (optional)
  - Each row represents a unique customer issue
- Parsed into:
  - id: Automatically assigned based on row number (starting from 1)
  - issue: The customer question text from column B
  - solutions_used: List of solution indices that can be manually set later
  - ai_solutions_used: List of AI solution indices from column D
Solutions file: Contains expert solutions.
- Required structure:
  - Excel file with solutions (header in row 1)
  - Column A: Solution text with title and steps
  - Column B: Optional error messages
- Parsed into:
  - id: Automatically assigned based on row number (starting from 1)
  - title: Extracted from the first line of the solution text (e.g., "Solutions 1")
  - steps: Each line after the title becomes a step in the solution
  - error_message: Any text in column B becomes the error message

You can check data/samples to see the sample of these two files.

Usage

Run the script with various command-line options to control its behavior:

python main.py [options]

Command line options:

Options	Descriptions	Default
`--limit N`, `-l`	Process only the first N questions	0 (all questions)
`--question-id ID`, `--id`	Process only a specific question by ID	None
`--verbose`, `-v`	Display detailed logs and results	False
`--report-dir DIR`, `--rd`	Directory to save reports	"reports"
`--wait-time SEC`, `--wt`	Wait time between API calls in seconds	1.0
`--skip-report`, `--sr`	Skip report generation	False
`--export-excel`, `--ee`	Export evaluation report to Excel	False
`--pre-process`, `--pre`	Enhance queries before sending to model	False
`--post-process`, `--post`	Generate improved prompts for low-quality responses	False
`--retry`, `-r`	Retry with improved prompts when quality is low	False
`--bert-threshold T`, `--bt`	BERT score threshold for quality assessment	0.5
`--f1-threshold T`, `--f1`	F1 score threshold for quality assessment	0.3
`--bleu-threshold T`, `--bl`	BLEU score threshold for quality assessment	0.1
`--combined-threshold T`, `--ct`	Combined score threshold for quality assessment	0.4

Example

Process all questions:

python main.py

Process only the first 5 questions:

python main.py --limit 5

Process only specific question by ID:

python main.py --question-id 2

Generate detailed output for each questions:

python main.py --verbose

Save reports to a custom directory:

python main.py --report-dir my_reports

Adjust wait time between API calls:

python main.py --wait-time 2.0

For Docker, it's the same as the previous example — you just need to add this in front of the command:

docker-compose run --rm kb-scorer python main.py --limit 1

Troubleshooting

No response from OpenWebUI

Verify your JWT token is valid and not expired
Check that the OpenWebUI API URL is correct
Ensure OpenWebUI is running and accessible

Missing files error

Verify that the data directory and Excel files exist
Check the paths in your .env file

Low scores across all questions

The model may not be suitable for your domain
Consider adjusting the quality thresholds
Review your reference solutions for clarity

Environment Variable Issues

If you update your .env file and changes aren't detected:

Make sure to use # for comments
Restart your terminal/command prompt

Excel Format Issues

If you're getting parsing errors:

Check that the header row is set correctly (defaults to row 3)
Verify column mappings in the DataExtractor configuration
For solutions, ensure they follow the "Solution X" format with numbered steps

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KB-AnswerScorer

Overview

Project Structure

Installation (With Docker)

Installation (Without Docker)

Configuration

Getting the OpenWebUI JWT Token

Input Files

Usage

Example

Troubleshooting

No response from OpenWebUI

Missing files error

Low scores across all questions

Environment Variable Issues

Excel Format Issues

License

About

Uh oh!

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.vscode		.vscode
data/samples		data/samples
metrics		metrics
opwebui		opwebui
utils		utils
.dockerignore		.dockerignore
.env-sample		.env-sample
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt
update-docker.ps1		update-docker.ps1

License

farithadnan/KB-AnswerScorer

Folders and files

Latest commit

History

Repository files navigation

KB-AnswerScorer

Overview

Project Structure

Installation (With Docker)

Installation (Without Docker)

Configuration

Getting the OpenWebUI JWT Token

Input Files

Usage

Example

Troubleshooting

No response from OpenWebUI

Missing files error

Low scores across all questions

Environment Variable Issues

Excel Format Issues

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages