InstructLab crowd sources the process of tuning and improving models by collecting two types of data, knowledge and skills. These submissions are collected in a taxonomy of YAML files to be used in the synthetic data generation process.
We accept contributions of both Skills and Knowledge to InstructLab.
Skills
Knowledge
If you would like to contribute any third-party data to either the Skills or Knowledge taxonomies, you must ensure the license on the data is unrestricted for commercial use.
This applies to:
- Data embedded in
.md
files as knowledge - Data offered as
context
inqna.yaml
files for skills - Citing your sources in your
attribution.txt
file - Questions and answers sourced from elsewhere and used as
qna.yaml
submissions
For this project, unless the file says otherwise, or unless the attributed source provided in the file says otherwise, the relevant open source license is the Apache License, Version 2.0. All contributions that leverage third party content should either come from the public domain (e.g. out of copyright, or .gov sites) or be licensed with an open data license that does not restrict commercial use or the creation of derivative works, including the following license types:
- CC0
- CDLA-Permissive-2.0
- CC-BY-4.0
- Apache 2.0
- MIT
Any third party content contributed to this project undergoes modifications in order to formulate it in the templated format required for submission to this project.
- Christianity in Nepal, Wikipedia, Wikimedia Foundation, 24 April 2024.
- Concepts of Biology - 1st Canadian Edition, Chapter 11.3 Circulatory and Respiratory Systems. Copyright 2015 by Charles Molnar and Jane Gair, licensed under a Creative Commons Attribution 4.0 License. No modifications were made to the text.
- World History, volume 2: from 1400, Chapter 6.3 Capitalism and the First Industrial Revolution. Copyright 2022 Rice University, licensed under a Creative Commons Attribution 4.0 License. No modifications were made to the text.