Skip to content

Commit ab458fc

Browse files
authored
Merge pull request #53 from finitearth/feature/RewardTask
Feature/reward task
2 parents 1014ccf + 48dca49 commit ab458fc

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+3313
-1068
lines changed

.coverage

16 KB
Binary file not shown.

.github/workflows/ci.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,6 @@ jobs:
3434
- name: Run tests with coverage
3535
run: |
3636
poetry run python -m pytest --junitxml=pytest.xml --cov-report=term-missing:skip-covered --cov=. tests/ > pytest-coverage.txt
37-
cat pytest-coverage.txt
3837
3938
- name: Generate coverage report & comment on PR
4039
id: coverageComment

.github/workflows/docs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ jobs:
3030

3131
- name: Generate notebook examples
3232
run: |
33-
poetry run jupyter nbconvert --to markdown --allow-errors --output-dir docs/examples notebooks/*.ipynb
33+
poetry run jupyter nbconvert --to markdown --allow-errors --output-dir docs/examples tutorials/*.ipynb
3434
3535
- name: Deploy docs
3636
run: |

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,4 @@ results/
1111
poetry.lock
1212
CLAUDE.md
1313
**/CLAUDE.local.md
14+
.mypy_cache/

.pre-commit-config.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,16 @@ repos:
1818
rev: 5.12.0
1919
hooks:
2020
- id: isort
21+
- repo: https://github.com/pre-commit/mirrors-mypy
22+
rev: v1.8.0
23+
hooks:
24+
- id: mypy
25+
files: ^promptolution/
26+
additional_dependencies:
27+
- types-requests
28+
- pandas-stubs
29+
- numpy
30+
args: [--explicit-package-bases, --config-file=pyproject.toml]
2131
- repo: https://github.com/pycqa/pydocstyle
2232
rev: 6.3.0
2333
hooks:

README.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
![promptolution](https://github.com/user-attachments/assets/84c050bd-61a1-4f2e-bc4e-874d9b4a69af)
22

3-
![Coverage](https://img.shields.io/badge/Coverage-89%25-green)
3+
![Coverage](https://img.shields.io/badge/Coverage-91%25-brightgreen)
44
[![CI](https://github.com/finitearth/promptolution/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/finitearth/promptolution/actions/workflows/ci.yml)
55
[![Docs](https://github.com/finitearth/promptolution/actions/workflows/docs.yml/badge.svg?branch=main)](https://github.com/finitearth/promptolution/actions/workflows/docs.yml)
66
![Code Style](https://img.shields.io/badge/Code%20Style-black-black)
77
![Python Versions](https://img.shields.io/badge/Python%20Versions-≥3.10-blue)
8-
9-
8+
[![Getting Started](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/finitearth/promptolution/blob/main/tutorials/getting_started.ipynb)
109

1110
Promptolution is a library that provides a modular and extensible framework for implementing prompt tuning for single tasks and larger experiments. It offers a user-friendly interface to assemble the core components for various prompt optimization tasks.
1211

@@ -36,7 +35,7 @@ to install the necessary dependencies. You might need to install [pipx](https://
3635

3736
## Usage
3837

39-
To get started right away, take a look at our [getting started notebook](https://github.com/finitearth/promptolution/blob/main/notebooks/getting_started.ipynb).
38+
To get started right away, take a look at our [getting started notebook](https://github.com/finitearth/promptolution/blob/main/tutorials/getting_started.ipynb) and our [other demos and tutorials](https://github.com/finitearth/promptolution/blob/main/tutorials).
4039
For more details, a comprehensive **documentation** with API reference is availabe at https://finitearth.github.io/promptolution/.
4140

4241
### Featured Optimizers

docs/examples/getting_started.md

Lines changed: 156 additions & 309 deletions
Large diffs are not rendered by default.
Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
# Getting Started: LLM as a Judge with Promptolution
2+
3+
## Welcome to Promptolution!
4+
5+
Discover a powerful tool for evolving and optimizing your LLM prompts. This notebook provides a friendly introduction to one of Promptolution's most advanced features: LLM as a Judge.
6+
7+
While the standard getting_started notebook shows how to optimize for classification tasks, this guide will focus on something different. We'll optimize prompts for a creative task where there's no single "correct" answer: *Finding an optimal argument for a statement*!
8+
9+
## Intro
10+
In traditional machine learning and prompt optimization, we often rely on labeled data. For a classification task, you need an input (x) and a corresponding ground-truth label (y). The goal is to find a prompt that helps the model predict y correctly.
11+
But what if your task is more subjective? How do you "label" things like:
12+
13+
- The quality of a generated argument?
14+
- The creativity of a story?
15+
- The helpfulness of a summary?
16+
- The persuasiveness of an essay?
17+
18+
This is where LLM as a Judge comes in. Instead of relying on a pre-defined dataset of labels, we use another powerful Language Model (the "judge") to score the output of our prompts. The process looks like this:
19+
20+
A candidate prompt is used to generate a response (e.g., an argument).
21+
A "judge" LLM then evaluates this response based on the task provided and assigns a score.
22+
Promptolution's optimizer uses these scores to identify which prompts are best and evolves them to generate even better responses.
23+
24+
The beauty of this approach is its flexibility. While you can provide groundtruths (in case there is a correct answer) and let the LLM judge itself if both the prediction and the correct answer are equivalent - you don't need to.
25+
26+
*New to Promptolution? If you haven't seen our classification tutorial yet, check out `getting_started.ipynb` first! It covers the basics of prompt optimization with simpler tasks like text classification. This notebook builds on those concepts but tackles more complex, subjective tasks.*
27+
28+
## Installation
29+
Install Promptolution with a single command
30+
31+
32+
```python
33+
! pip install promptolution[api]
34+
```
35+
36+
## Imports
37+
38+
39+
```python
40+
import pandas as pd
41+
from promptolution.utils import ExperimentConfig
42+
from promptolution.helpers import run_experiment
43+
import nest_asyncio
44+
45+
nest_asyncio.apply() # Required for notebook environments
46+
```
47+
48+
## Setting Up Your Experiment
49+
50+
### Prepare the data
51+
52+
For this tutorial, we're using IBM's Argument Quality Ranking dataset - a collection of crowd-sourced arguments on controversial topics like capital punishment, abortion rights, and climate change.
53+
54+
Unlike classification tasks where you have clear input-output pairs, here we're working with debate topics that we want to generate compelling arguments for.
55+
56+
57+
```python
58+
df = pd.read_csv("hf://datasets/ibm-research/argument_quality_ranking_30k/dev.csv").sample(300)
59+
```
60+
61+
62+
```python
63+
print("\nSample topics:")
64+
for topic in df["topic"].unique()[:3]:
65+
print(f"- {topic}")
66+
```
67+
68+
69+
Sample topics:
70+
- We should adopt a zero-tolerance policy in schools
71+
- Payday loans should be banned
72+
- Intelligence tests bring more harm than good
73+
74+
75+
Our task: **Given a controversial statement, generate the strongest possible argument supporting that position.**
76+
77+
Let's look at what we're working with:
78+
79+
### Creating Inital Prompts
80+
81+
Here are some starter prompts for generating compelling arguments. Feel free to experiment with your own!
82+
83+
84+
```python
85+
init_prompts = [
86+
"Create a strong argument for this position with clear reasoning and examples:",
87+
"Write a persuasive argument supporting this statement. Include evidence and address counterarguments:",
88+
"Make a compelling case for this viewpoint using logical reasoning and real examples:",
89+
"Argue convincingly for this position. Provide supporting points and evidence:",
90+
"Build a strong argument for this statement with clear structure and solid reasoning:",
91+
"Generate a persuasive argument supporting this position. Use facts and logical flow:",
92+
"Create a well-reasoned argument for this viewpoint with supporting evidence:",
93+
"Write a convincing argument for this position. Include examples and counter opposing views:",
94+
"Develop a strong case supporting this statement using clear logic and evidence:",
95+
"Construct a persuasive argument for this position with solid reasoning and examples:",
96+
]
97+
```
98+
99+
### Configure Your LLM
100+
101+
For this demonstration, we will again use the DeepInfra API, but you can easily switch to other providers like Anthropic or OpenAI by simply changing the `api_url` and `model_id`.
102+
103+
104+
```python
105+
api_key = "YOUR_API_KEY" # Replace with your Promptolution API key
106+
```
107+
108+
Here are the key parameters for LLM-as-a-Judge tasks:
109+
110+
111+
```python
112+
config = ExperimentConfig(
113+
optimizer="evopromptga",
114+
task_description="Given a statement, find the best argument supporting it.",
115+
x_column="topic",
116+
prompts=init_prompts,
117+
n_steps=3,
118+
n_subsamples=10,
119+
subsample_strategy="random_subsample",
120+
api_url="https://api.deepinfra.com/v1/openai",
121+
model_id="meta-llama/Meta-Llama-3-8B-Instruct",
122+
api_key=api_key,
123+
task_type="judge",
124+
)
125+
```
126+
127+
- `task_type="judge"` - This tells Promptolution to use LLM evaluation instead of accuracy metrics
128+
- `x_column="topic"` - We specify which column contains our input (debate topics)
129+
- `optimizer="evopromptga"` - In the classification task we show cased CAPO, here we are using EvoPrompt, a strong evolutionary prompt optimizer.
130+
- No y column needed - the judge will evaluate quality without ground truth labels!
131+
132+
## Run Your Experiment
133+
134+
With everything configured, you're ready to optimize your prompts! The run_experiment function will:
135+
136+
1. Evaluate your initial prompts by generating arguments and having the judge LLM score them
137+
1. Use evolutionary operators (mutation, crossover) to create new prompt variations from the 1. best-performing ones
138+
1. Test these new prompt candidates and select the fittest ones for the next generation
139+
1. Repeat this evolutionary process for the specified number of steps, gradually improving prompt 1. quality
140+
141+
142+
```python
143+
prompts = run_experiment(df, config)
144+
```
145+
146+
🔥 Starting optimization...
147+
148+
149+
You can expect this to take several minutes as the optimizer generates arguments, evaluates them with the judge, and evolves the prompts.
150+
151+
152+
```python
153+
prompts
154+
```
155+
156+
157+
158+
159+
<div>
160+
<style scoped>
161+
.dataframe tbody tr th:only-of-type {
162+
vertical-align: middle;
163+
}
164+
165+
.dataframe tbody tr th {
166+
vertical-align: top;
167+
}
168+
169+
.dataframe thead th {
170+
text-align: right;
171+
}
172+
</style>
173+
<table border="1" class="dataframe">
174+
<thead>
175+
<tr style="text-align: right;">
176+
<th></th>
177+
<th>prompt</th>
178+
<th>score</th>
179+
</tr>
180+
</thead>
181+
<tbody>
182+
<tr>
183+
<th>0</th>
184+
<td>Construct a persuasive argument supporting the given statement, relying on logical coherence and evidence-based reasoning.</td>
185+
<td>0.931500</td>
186+
</tr>
187+
<tr>
188+
<th>1</th>
189+
<td>Develop a strong case supporting this statement using clear logic and evidence:</td>
190+
<td>0.924167</td>
191+
</tr>
192+
<tr>
193+
<th>2</th>
194+
<td>Construct a convincing case supporting the stated argument, providing evidence and responding to potential objections.</td>
195+
<td>0.915833</td>
196+
</tr>
197+
<tr>
198+
<th>3</th>
199+
<td>Develop a well-reasoned argument in favor of the given statement, incorporating reliable examples and addressing potential counterpoints.</td>
200+
<td>0.913333</td>
201+
</tr>
202+
<tr>
203+
<th>4</th>
204+
<td>Write a persuasive argument supporting this statement. Include evidence and address counterarguments:</td>
205+
<td>0.907500</td>
206+
</tr>
207+
<tr>
208+
<th>5</th>
209+
<td>Present a convincing case for this assertion, incorporating logical premises and applicable examples.</td>
210+
<td>0.903333</td>
211+
</tr>
212+
<tr>
213+
<th>6</th>
214+
<td>Fortify the provided statement with a robust and well-reasoned argument, underscoring logical relationships and leveraging empirical support to build a compelling case, while also anticipating and addressing potential counterpoints.</td>
215+
<td>0.902500</td>
216+
</tr>
217+
<tr>
218+
<th>7</th>
219+
<td>Construct a strong claim in support of this statement, employing a logical framework and relevant examples to make a convincing case.</td>
220+
<td>0.891667</td>
221+
</tr>
222+
<tr>
223+
<th>8</th>
224+
<td>Create a well-reasoned argument for this viewpoint with supporting evidence:</td>
225+
<td>0.888333</td>
226+
</tr>
227+
<tr>
228+
<th>9</th>
229+
<td>Extract the most compelling supporting argument for this statement, grounding it in logical reasoning and bolstered by relevant evidence and examples.</td>
230+
<td>0.697500</td>
231+
</tr>
232+
</tbody>
233+
</table>
234+
</div>
235+
236+
237+
238+
The best prompts aren't always the most obvious ones - let the optimizer surprise you with what works!
239+
240+
241+
Happy prompt optimizing! 🚀✨ We can't wait to see what you build with Promptolution! 🤖💡
242+
243+
244+
```python
245+
246+
```

0 commit comments

Comments
 (0)