-
Notifications
You must be signed in to change notification settings - Fork 31
Add Dataclasses and RepoEnv Info refac #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fix test_human
example_agent/utils.py
Outdated
] | ||
prp = self.prompt_response_pairs[game_step] | ||
if prp and include_prompt_response_pairs: | ||
json_out["prompt_response_pairs"] = self._format_prompt_response_pairs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm changing the format of the prompt_response_pairs
when the prompt is a list of messages, see _format_prompt_response_pairs
and tests/test_agents.py:193
. I'm not sure I understand the previous logic when there are multiple message turns. Now the messages are concatenated into one prompt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was for the cases where there are multiple steps of LLM calls in a single env.step. For instance, in some CoT settings, we first call the LLM to generate the reasoning string, then conditioned on this string, we ask the LLM again to generate an action. I believe it's better to keep the list instead of concat, because list can be better presented when doing json.dumps when creating prompts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the string format, but moved the token_usage inside prompt_response_pair
since there's usage for each of the llm calls
example_agent/utils.py
Outdated
] | ||
prp = self.prompt_response_pairs[game_step] | ||
if prp and include_prompt_response_pairs: | ||
json_out["prompt_response_pairs"] = self._format_prompt_response_pairs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was for the cases where there are multiple steps of LLM calls in a single env.step. For instance, in some CoT settings, we first call the LLM to generate the reasoning string, then conditioned on this string, we ask the LLM again to generate an action. I believe it's better to keep the list instead of concat, because list can be better presented when doing json.dumps when creating prompts.
… inside `prompt_response_pairs`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's go!
EnvInfo
) and LLM responses (LLMResponse
andTokenUsage
)step
andreset
always return anEnvInfo
object instead of tuplesLLMResponse
HistoryTracker
HistoryTracker.save_prompt_response_pairs
intoHistoryTracker.step