You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This page serves as an FAQ for the taxonomy repository. Note that this page covers more niche questions related to the taxonomy repository. For more general questions related to InstructLab and contributing to the taxonomy repository, see .
About the taxonomy repository
InstructLab uses a novel synthetic data-based alignment tuning method for Large Language Models (LLMs.) The "lab" in InstructLab stands for Large-scale Alignment for Chat Bots. The LAB method is driven by taxonomies, which are largely created manually and with care.
The taxonomy repository contains a taxonomy tree that will allow you to create models tuned with your data (enhanced via synthetic data generation) using the LAB method.
Taxonomy repository FAQs
The following FAQs are common questions related to the LLM's taxonomy and the taxonomy repository.
Q: What languages are contributions being accepted in?
A: Contributions are currently accepted in English.
Q: How do I sign the DCO if my PR was blocked?
A: To ensure your PR isn't blocked in the future, always include "Signed-off-by: Author Name authoremail@example.com" in every commit message. You can also do this automatically by using the -s flag (i.e., git commit -s).
Q: Are there any tools within the project to ensure that our YAML files are properly formatted before submitting them?
A: Currently, we're in the process of implementing tools for this purpose. Some PRs are already up, and we're also considering adding a linter on the taxonomy repository as a PR check.
Q: Do we know who is approving PRs to add skills? Is it one person, multiple people, etc? It seems that this model of training is highly susceptible to implicit bias.
A: We have a dedicated team managing this task. They are meticulous and are developing governance and processes to ensure fair and unbiased approval of PRs.
Q: LLMs always get confused with dates. Can we teach it to understand calendars?
A: While teaching the LLM to understand calendars may not be feasible, we can add a skill to acknowledge its limitations in answering certain questions.
Q: Can someone "inside" of the team discuss how this knowledge base will be curated? With thousands of contributions expected, how will variations in skill, depth, topic, relevance, and quality be managed?
A: We have a dedicated taxonomy triage and review workstream developing processes and documentation to address this challenge and ensure the quality and relevance of contributions.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Taxonomy repository FAQs
This page serves as an FAQ for the taxonomy repository. Note that this page covers more niche questions related to the taxonomy repository. For more general questions related to InstructLab and contributing to the taxonomy repository, see .
About the taxonomy repository
InstructLab uses a novel synthetic data-based alignment tuning method for Large Language Models (LLMs.) The "lab" in InstructLab stands for Large-scale Alignment for Chat Bots. The LAB method is driven by taxonomies, which are largely created manually and with care.
The taxonomy repository contains a taxonomy tree that will allow you to create models tuned with your data (enhanced via synthetic data generation) using the LAB method.
Taxonomy repository FAQs
The following FAQs are common questions related to the LLM's taxonomy and the taxonomy repository.
Q: What languages are contributions being accepted in?
A: Contributions are currently accepted in English.
Q: How do I sign the DCO if my PR was blocked?
A: To ensure your PR isn't blocked in the future, always include "Signed-off-by: Author Name authoremail@example.com" in every commit message. You can also do this automatically by using the -s flag (i.e.,
git commit -s
).Q: Are there any tools within the project to ensure that our YAML files are properly formatted before submitting them?
A: Currently, we're in the process of implementing tools for this purpose. Some PRs are already up, and we're also considering adding a linter on the taxonomy repository as a PR check.
Q: Do we know who is approving PRs to add skills? Is it one person, multiple people, etc? It seems that this model of training is highly susceptible to implicit bias.
A: We have a dedicated team managing this task. They are meticulous and are developing governance and processes to ensure fair and unbiased approval of PRs.
Q: LLMs always get confused with dates. Can we teach it to understand calendars?
A: While teaching the LLM to understand calendars may not be feasible, we can add a skill to acknowledge its limitations in answering certain questions.
Q: Can someone "inside" of the team discuss how this knowledge base will be curated? With thousands of contributions expected, how will variations in skill, depth, topic, relevance, and quality be managed?
A: We have a dedicated taxonomy triage and review workstream developing processes and documentation to address this challenge and ensure the quality and relevance of contributions.
Beta Was this translation helpful? Give feedback.
All reactions