"Task" and "Bot" core abstractions #1664

sfarthin · 2024-04-10T20:24:53Z

sfarthin
Apr 10, 2024

Describe the feature or potential improvement

What I appreciate about LangFuse is its core abstraction: the Trace. It's versatile enough to be applied to a broad range of use-cases, while also allowing the construction of a rich set of observation and evaluation tools for various model implementations. In past experience, feedback is normally dispersed across multiple spreadsheets, databases, and Jupyter notebooks. At Rocket Money, we aim to use Traces for many different use-cases, including:

Various text extraction models
Classification models on diverse inputs
A Named Entity Recognition (NER) model
A set of heuristics to identify specific transaction patterns
Tasks that take natural input instructions and generate an answer.

These are not theoretical use-cases; they are real, in-production use-cases. Thanks to the adaptability of a trace, we can visualize and collect manual feedback for all these use-cases using one tool, despite their differing implementations.

Task

Although a trace can effectively observe anything, Langfuse's tools are limited without understanding the original task. Essentially, a Task is constrained to a well-defined input and output. If we attach a Task to Datasets and Traces, we could transition from editing raw JSON in textboxes to using validated form fields.

Given the existing tooling and OpenAI's use of JSON schemas, defining task inputs and outputs as JSON schemas offers great flexibility (See the playground example of using a JSON schema to drive a form). In practice, these tasks are likely to be defined in the main software application where the contract is already established. If zod is employed in the main application, we can seamlessly relay that contract and register the task within LangFuse through a new API request. I envision this as an idempotent process that occurs during continuous integration when the main branch is merged. In this way, all these tasks will populate LangFuse as they are defined in the main application.

Bot - more flexible than Prompts

There are many tools and services available for iterating and evaluating prompts on foundational Large Language Models (LLMs). We've used HumanLoop, Vellum, Helicone, Braintrust, and various open-source tools that provide similar support. However, these solutions often fall short when tasks require more than a single string-interpolated LLM completion.

For example, we usually start with a single string-interpolated template for an LLM completion. Still, we often move away from this because:

We require advanced string interpolation, including tools and control structures such as loops that are not provided by the current tool.
We fine-tune another modern LLM (e.g., BERT, Mistral) or develop a non-LLM model.
We break the problem into multiple LLM completions. Sometimes this involves dividing the issue into separate parts, each solved by an LLM completion. Other times, we consult multiple different LLMs to increase our final confidence.

These are not hypothetical situations but reflect real scenarios. In the future, I foresee multi-agent solutions like AutoGen or CrewAI and tools like DSPy reducing the relevance of a "prompting iterator". However, evaluations and traces will become even more important.

Iterating on these tasks outside of a software engineering context can significantly increase velocity, allowing domain/ML experts and software engineers to work concurrently. Similar to Traces, a Bot is an abstraction that can follow these use-cases even after they move beyond the basic string-template LLM solution. With all the above solutions, there are aspects that can be tweaked. In these situations, traces, datasets, and feedback remain vital for iteration and improvement. As long as the software contract, also known as the Task, stays the same, software engineers are not required.

For instance, in the NER example above, we can adjust the model and a confidence threshold without breaking the software contract. We can establish this bot configuration schema when defining Tasks, along with the clearly defined input and output schemas. Now LangFuse doesn't solely cater to a single implementation, but rather, it can display an appropriate interface for any type of implementation for a Task. JSON schema's support unions, enabling us to create Bots with various implementations. Other than that, a Bot table record can function exactly like a Prompt table record defined in the LangFuse codebase.

Additional information

No response

sfarthin · 2024-04-10T20:37:26Z

sfarthin
Apr 10, 2024
Author

The POC is PR'd here: #1665

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse

"Task" and "Bot" core abstractions #1664

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Langfuse

"Task" and "Bot" core abstractions #1664

sfarthin Apr 10, 2024

Describe the feature or potential improvement

Task

Bot - more flexible than Prompts

Additional information

Replies: 1 comment

sfarthin Apr 10, 2024 Author

sfarthin
Apr 10, 2024

sfarthin
Apr 10, 2024
Author