regllm

regllm is a library for running offline regression tests on Large Language Model (LLM) responses using Ollama and Zod.

Installing regllm

npm i regllm

Overview

The core of this framework is the LlmEvaluator class (found in src/llm.eval.ts). This class takes an input prompt, the actual output from your LLM engine/framework/tool, and a reference output. It then uses ollama to evaluate the actual output against the reference output, using zod for structured outputs.

Getting Started (Examples)

regllm is a single file library (src/llm.eval.ts). Below are a few example usages, with additional examples in the included test suite (tests/llm.eval.test.ts).

import { LlmEvaluator, LlmEvalInput } from "regllm"

async function runEvaluation() {
  const evaluator = new LlmEvaluator('llama3.2:latest');

  const evalInput: LlmEvalInput = {
    input: "What is the capital of France?",
    actual_output: "Paris is the capital of France.",
    reference_output: "The capital of France is Paris."
  };

  const result = await evaluator.eval(evalInput);
  console.log(`llm response passed: ${result.passed}`)
}

runEvaluation();

A more advanced use-case where this library can be utilized is where you want to validate the generated response from your own LLM use within your own applications.

import { LlmEvaluator, LlmEvalInput } from "regllm"

describe('LlmEngine', () => {
  let evaluator: LlmEvaluator;
  let yourFancyLlmEngine: FancyLlmEngine;
  beforeAll(() => {
    evaluator = new LlmEvaluator('llama3.2:latest');
    yourFancyLlmEngine = {};
  })

  it(' uses resume and job description artifacts to find the best candidate', async () => {
    const input = "Using the following set of candidate resume artifacts ${process.env.RESUME_ARTIFACTS_CONTENT}, help me find the single best candidate for the following job description: ${process.env.JOB_DESCRIPTION_CONTENT}";
    
    // call your llm engine with the input
    const actual_output = await yourFancyLlmEngine.call(input);

    // construct the llm evaluation criteria (original input, actual output provided by your llm engine and the expected response)
    const evalInput: LlmEvalInput = {
      input,
      actual_output,
      reference_output: "The best candidate for the provided job description is Abraham Lincoln"
    };

    // evaluate the response from your llm engine, making sure the actual response is inline with the expected response
    const result = await evaluator.eval(evalInput);
    expect(result.passed).toBeTruthy();
  });
})

Note on context size and ollama

By default, ollama uses a context length of 2048 tokens. Depending on the type of evaluations being done, this may or may not be enough. The LlmEvaluator class exposes the Ollama.Options interface as a constructor property. To override any defaults in ollama, like the context length, you can do the following:

const evaluator = new LlmEvaluator({
  model: 'llama3.2:latest',
  options: {
    num_ctx: 4096 // override options here, like num_ctx
  }
});

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.vscode		.vscode
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

regllm

Installing regllm

Overview

Getting Started (Examples)

Note on context size and ollama

About

Uh oh!

Releases

Packages

Languages

License

georgebearden/regllm

Folders and files

Latest commit

History

Repository files navigation

regllm

Installing regllm

Overview

Getting Started (Examples)

Note on context size and ollama

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages