Training template

Hi, thanks for the great work. I have a question for how to transforming the training dataset to fit llama_factory format

I'd like to ask for advice on how to properly construct the training data format for llama_factory fine-tuning.  I found FollowIR-7B's training set on huggingface, and the format is as follows: 
```json
{
  "score": "the score from Mistral-Instruct-7B-v0.2 of whether it was relevant or not (1 is relevant, 0 is not)"
  "label": "the label of relevance from GPT-3.5-Turbo-1106 who created the document"
  "id": "the id from the original TREC track and the file it came from"
  "document": "the synthetic document produced by GPT-3.5-Turbo-1106 given the original instruction, query, and label"
  "query": "the query written by TREC"
  "instruction": "the instruction (or narrative) written by TREC for human annotation"
}
```

For fitting the llama_factory 's format, Should the format I build for fine-tuning look like this:
```json
{
   "instruction": "<s> [INST] You are an expert Google searcher, whose job is to determine if the following document is relevant to the query (true/false). Answer using only one word, one of those two choices.\n"
   "input": "Query: {query}  {instruction}\n Document: {document}\n Relevant (only output one word, either \"true\" or \"false\"): [/INST]"
   "output": "{label}"
}
```
I will appreciate it if you can give me an example for it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training template #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Training template #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions