-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Rejection sampling data generation pipeline with SelfImprovingCoT pipeline #1646
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @JoyceXu02 , seems this PR also includes the change to cookbook, could you clean this PR only including the change to self_improving_cot.py
file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @JoyceXu02 , left some comments below
camel/datagen/self_improving_cot.py
Outdated
rejection_sampling: Optional[bool] = False, | ||
rejection_sampling_n: int = 5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can simpify the interface, if rejection_sampling_n is not None, then we enables the pipeline to generate multiple candidate traces
camel/datagen/self_improving_cot.py
Outdated
for _i in range(self.rejection_sampling_n): | ||
trace = self.generate_reasoning_trace(problem) | ||
candidate_traces.append(trace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some models support n
parameter that can generate multi output with one request, could we leverage this for better efficiency? refer: https://platform.openai.com/docs/api-reference/chat/create#chat-create-n
camel/datagen/self_improving_cot.py
Outdated
|
||
Args: | ||
problem (str): The problem text for generating a reasoning trace. | ||
max_attempts (int): The number of candidate traces to generate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no max_attempts
passed to arg
camel/datagen/self_improving_cot.py
Outdated
first candidate if none qualify. | ||
""" | ||
candidate_traces = [] | ||
for _i in range(self.rejection_sampling_n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.rejection_sampling_n
should not be hardcoded to 5
camel/datagen/self_improving_cot.py
Outdated
candidate_traces.append(trace) | ||
|
||
best_trace = None | ||
best_avg_score = -1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to define initial value of best_avg_score >0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Wendong-Fan , will fix it.
…1595) Co-authored-by: Wendong <w3ndong.fan@gmail.com>
Co-authored-by: Wendong <w3ndong.fan@gmail.com>
Co-authored-by: Wendong-Fan <133094783+Wendong-Fan@users.noreply.github.com> Co-authored-by: Xiaotian Jin <jinxiaotian_sal@outlook.com> Co-authored-by: Wendong <w3ndong.fan@gmail.com>
Co-authored-by: Wendong-Fan <133094783+Wendong-Fan@users.noreply.github.com>
Co-authored-by: Wendong-Fan <133094783+Wendong-Fan@users.noreply.github.com> Co-authored-by: Wendong <w3ndong.fan@gmail.com>
Co-authored-by: Wendong-Fan <133094783+Wendong-Fan@users.noreply.github.com>
…1627) Co-authored-by: Wendong-Fan <133094783+Wendong-Fan@users.noreply.github.com>
Co-authored-by: Wendong-Fan <133094783+Wendong-Fan@users.noreply.github.com> Co-authored-by: Wendong <w3ndong.fan@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JoyceXu02 ! Left some comments below
camel/datagen/self_improving_cot.py
Outdated
r""" | ||
Generate multiple candidate reasoning traces for a problem and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstring format
r""" | |
Generate multiple candidate reasoning traces for a problem and | |
r"""Generate multiple candidate reasoning traces for a problem and |
camel/datagen/self_improving_cot.py
Outdated
str: The best candidate trace that meets quality criteria, or the | ||
first candidate if none qualify. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str: The best candidate trace that meets quality criteria, or the | |
first candidate if none qualify. | |
str: The best candidate trace that meets quality criteria, or the | |
first candidate if none qualify. |
camel/datagen/self_improving_cot.py
Outdated
self.reason_agent.model_backend.model_config_dict['n'] = ( | ||
self.rejection_sampling_n | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not all models support n
parameter, for those doesn't support n
we still need to use loop to generate multiple content
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense. I will update on this soon.
camel/datagen/self_improving_cot.py
Outdated
best_trace = trace | ||
best_avg_score = avg_score | ||
if best_trace is None: | ||
best_trace = candidate_traces[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we return the one with highest score even it didn't meet the threshold instead of hardcode to the first candidate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. I will make a change on this one too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @JoyceXu02 !
…oT pipeline (camel-ai#1646) Co-authored-by: Wendong-Fan <133094783+Wendong-Fan@users.noreply.github.com> Co-authored-by: Xiaotian Jin <jinxiaotian_sal@outlook.com> Co-authored-by: Wendong <w3ndong.fan@gmail.com> Co-authored-by: Asher-hss <101127070+Asher-hss@users.noreply.github.com> Co-authored-by: Zoe Yan <73959962+zoezyn@users.noreply.github.com> Co-authored-by: Sarthak Bhardwaj <7sarthakbhardwaj@gmail.com> Co-authored-by: Isaac Jin <whale3ye@gmail.com> Co-authored-by: TTS <50868301+TOGOTOO@users.noreply.github.com> Co-authored-by: Lei Zhang <zhanglei@apache.org> Co-authored-by: Yifeng Wang(正经人王同学) <86822589+zjrwtx@users.noreply.github.com>
Description
Describe your changes in detail (optional if the linked issue already contains a detailed description of the changes).
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.Fixes #issue-number
in the PR description (required)pyproject.toml
andpoetry.lock
Fixes [Feature Request] Rejection sampling data generation pipeline with SelfImprovingCoT pipeline #1504
If you are unsure about any of these, don't hesitate to ask. We are here to help!