Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[RAFT] Fix Datapoint Field in Formatter for Data Generation (#535)
This PR addresses an issue with the datapoint field in the formatter for data generation. Specifically, it corrects the column renaming in `format.py` on line 107. The line: ```python newds = ds.rename_columns({'question': 'prompt', 'cot_answer': 'completion'}) ``` has been updated to: ```python newds = ds.rename_columns({'instruction': 'prompt', 'cot_answer': 'completion'}) ``` The change is necessary because the "instruction" field already includes the question. Here is the relevant code snippet that sets the "instruction" field: ```python context = "" for doc in docs: context += "<DOCUMENT>" + str(doc) + "</DOCUMENT>\n" context += q datapt["instruction"] = context ``` We want to thank @HuiyingLi for bringing this up. Fixes #534. --------- Co-authored-by: Charlie Cheng-Jie Ji <charliechengjieji@berkeley.edu>
- Loading branch information