Skip to content

Commit

Permalink
Post release update (microsoft#985)
Browse files Browse the repository at this point in the history
* news update

* doc update

* avoid KeyError

* bump version to 1.2.1

* handle empty responses

* typo

* eval function
  • Loading branch information
sonichi authored Apr 10, 2023
1 parent a701cd8 commit c780d79
Show file tree
Hide file tree
Showing 7 changed files with 17 additions and 8 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
<br>
</p>

:fire: OpenAI GPT-3 models support in v1.1.3. ChatGPT and GPT-4 support will be added in v1.2.0.
:fire: v1.2.0 is released with support for ChatGPT and GPT-4.

:fire: A [lab forum](https://github.com/microsoft/FLAML/tree/tutorial-aaai23/tutorial) on FLAML at AAAI 2023.

Expand Down
10 changes: 9 additions & 1 deletion flaml/autogen/math_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -290,8 +290,16 @@ def eval_math_responses(responses, solution=None, **args):
Returns:
dict: The success metrics.
"""
success_list = []
n = len(responses)
if not n:
return {
"expected_success": 0,
"success": False,
"success_vote": 0,
"voted_answer": None,
"votes": 0,
}
success_list = []
if solution is not None:
for i in range(n):
response = responses[i]
Expand Down
2 changes: 1 addition & 1 deletion flaml/autogen/oai/completion.py
Original file line number Diff line number Diff line change
Expand Up @@ -843,7 +843,7 @@ def extract_text(cls, response: dict) -> List[str]:
choices = response["choices"]
if "text" in choices[0]:
return [choice["text"] for choice in choices]
return [choice["message"]["content"] for choice in choices]
return [choice["message"].get("content", "") for choice in choices]


class ChatCompletion(Completion):
Expand Down
2 changes: 1 addition & 1 deletion flaml/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "1.2.0"
__version__ = "1.2.1"
1 change: 1 addition & 0 deletions test/openai/test_completion.py
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,7 @@ def my_average(results):
print("tuned config", config)
result = oai.ChatCompletion.test(test_data_sample, config)
print("result from tuned config:", result)
print("empty responses", eval_math_responses([], None))


if __name__ == "__main__":
Expand Down
4 changes: 2 additions & 2 deletions website/docs/Examples/AutoGen-OpenAI.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ test_data = [
]
```

### Defining the metric
### Define the metric

Before starting tuning, you need to define the metric for the optimization. For each code generation task, we can use the model to generate multiple candidate responses, and then select one from them. If the final selected response can pass a unit test, we consider the task as successfully solved. Then we can define the average success rate on a collection of tasks as the optimization metric.

Expand All @@ -69,7 +69,7 @@ eval_with_generated_assertions = partial(eval_function_completions, assertions=g

This function will first generate assertion statements for each problem. Then, it uses the assertions to select the generated responses.

### Tuning Hyperparameters for OpenAI
### Tune the hyperparameters

The tuning will be performed under the specified optimization budgets.

Expand Down
4 changes: 2 additions & 2 deletions website/docs/Use-Cases/Auto-Generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,13 +44,13 @@ Collect a diverse set of instances. They can be stored in an iterable of dicts.
The evaluation function should take a list of responses, and other keyword arguments corresponding to the keys in each validation data instance as input, and output a dict of metrics. For example,

```python
def success_metrics(responses: List[str], problem: str, solution: str) -> Dict:
def eval_math_responses(responses: List[str], solution: str, **args) -> Dict:
# select a response from the list of responses
# check whether the answer is correct
return {"success": True or False}
```

`flaml.autogen` offers some example evaluation functions for common tasks such as code generation and math problem solving.
[`flaml.autogen.code_utils`](../reference/autogen/code_utils) and [`flaml.autogen.math_utils`](../reference/autogen/math_utils) offer some example evaluation functions for code generation and math problem solving.

### Metric to optimize

Expand Down

0 comments on commit c780d79

Please sign in to comment.