This example shows you how you can easily create (integration) test cases for Burr your project.
With Burr it is easy to foster a test-driven development mindset when developing actions, which are the building blocks of your Burr application.
Note: writing test cases for GenAI projects can be tricky. The same LLM API calls can result in different outputs. This means that 'exact' equality tests may not work, and you'll need to resort to more fuzzy tests like checking for the presence of certain words or phrases, or using LLMs to grade the output, etc. We aren't opinionated on how you do this, but in any case, you'll need to write a test case to exercise things, and this is what we're showing you how to do here.
With LLMs non-deterministic behavior can impact your application's behavior. It is important to create test cases to ensure that your application is robust and behaves as expected; it is easy to tweak a prompt and break a particular use case. This example shows you how you can create test cases for your project from real traces, since part of the struggle is to create test cases that are representative of the real-world behavior of your application -- Burr can make this process easier.
- Build your project with Burr.
- Run your Burr application with tracking/persistence.
- Find a trace that you want to create a test case for.
- Note the partition_key, app_id, and sequence_id of the trace. You can find these in the UI.
- Create a test case using the
burr-test-case create
command passing the project name, partition_key, app_id, and sequence_id as arguments. - Choose what you're going to test and how. E.g. exact match, similarity, LLM Grader, etc.
- Iterate on the test case until it is robust and representative of the real-world behavior of your application.
- Profit.
For step (5) above, this is the corresponding command:
%%sh
burr-test-case create \
--project-name "SOME_NAME" \
--partition-key "SOME_KEY" \
--app-id "SOME_ID" \
--sequence-id 0 \
--target-file-name /tmp/test-case.json
See the notebook for example usage.
In test_application.py
you'll find examples tests for a simple action
that is found in application.py
.
The notebook also shows how things work.
We see many more improvements here:
- Annotating data in the UI to make it easier to pull out.
- Automatically suggesting tests cases for you to add.
- Data export / integration with evaluation tools.
- etc. Please let us know what you need!