Reproducing the OctoCoder model

Hello, I have a few questions about OctoCoder. 

For this part in the paper:
> For instruction tuning our models, we select 5,000 random samples from COMMITPACKFT across the 6 programming languages that we evaluate on.

Could you please provide the exact training data and the launch script to fine-tune StarCoder into OctoCoder?

Or, the seeds that you used for selecting 5,000 instructions from CommitPackFT?

For a second question, was OctoCoder and the results in the paper produced using the `finetuning/starcoder/finetune.py` with LoRA/peft?

Thanks!

Btw, fantastic results @Muennighoff and team :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reproducing the OctoCoder model #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproducing the OctoCoder model #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions