Skip to content

Commit

Permalink
[RAFT] Fix Datapoint Field in Formatter for Data Generation (#535)
Browse files Browse the repository at this point in the history
This PR addresses an issue with the datapoint field in the formatter for
data generation. Specifically, it corrects the column renaming in
`format.py` on line 107. The line:

```python
newds = ds.rename_columns({'question': 'prompt', 'cot_answer': 'completion'})
```

has been updated to:

```python
newds = ds.rename_columns({'instruction': 'prompt', 'cot_answer': 'completion'})
```

The change is necessary because the "instruction" field already includes
the question. Here is the relevant code snippet that sets the
"instruction" field:

```python
context = ""
for doc in docs:
    context += "<DOCUMENT>" + str(doc) + "</DOCUMENT>\n"
context += q
datapt["instruction"] = context
```

We want to thank @HuiyingLi for bringing this up. 
Fixes #534.

---------

Co-authored-by: Charlie Cheng-Jie Ji <charliechengjieji@berkeley.edu>
  • Loading branch information
HuanzhiMao and CharlieJCJ authored Jul 20, 2024
1 parent 181cbef commit 7b230df
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion raft/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ class OpenAiCompletionDatasetFormatter(DatasetFormatter):
https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset
"""
def format(self, ds: Dataset, params: Dict[str, str]) -> Dataset:
newds = ds.rename_columns({'question': 'prompt', 'cot_answer': 'completion'})
newds = ds.rename_columns({'instruction': 'prompt', 'cot_answer': 'completion'})
return _remove_all_columns_but(newds, ['prompt', 'completion'])

class OpenAiChatDatasetFormatter(OpenAiCompletionDatasetFormatter):
Expand Down

0 comments on commit 7b230df

Please sign in to comment.