Skip to content

Commit

Permalink
More Flesch-Kincaid updates.
Browse files Browse the repository at this point in the history
  • Loading branch information
j-durbin committed Aug 4, 2023
1 parent a2b2a17 commit 2c6a221
Show file tree
Hide file tree
Showing 14 changed files with 20 additions and 16 deletions.
2 changes: 0 additions & 2 deletions airoboros/instructors/prompts/card.txt
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,6 @@ Be creative in the list of attributes each prompt should ask for, and try to com

All output text should be in {language}, but the exact terms "PROMPT" and "ANSWER" are special tokens that must not be translated.

All output should have an approximate Flesch reading ease score of {flesch}.

The output format should be:
PROMPT: [first prompt]
ANSWER: [first prompt's answer]
Expand Down
3 changes: 2 additions & 1 deletion airoboros/instructors/prompts/contextual.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ The random text block(s) should be extremely realistic, and should not include a

Each text block should be in {language}, but "BEGININPUT", "BEGINCONTEXT", "ENDCONTEXT", "ENDINPUT", "BEGININSTRUCTION" and "ENDINSTRUCTION" are special tokens that must not be translated.

Each text block should have an approximate Flesch reading ease score of {flesch}.
Random text block writing style:
{flesch}

The random text block(s) should be in the style:
{styles}
Expand Down
2 changes: 1 addition & 1 deletion airoboros/instructors/prompts/contextual_response.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Don't include any references unless asked.

If there are multiple context blocks from which the references are extracted, be sure to logically separate the references rather than including a single large mixed block.

All output should have an approximate Flesch reading ease score of {flesch}.
{flesch}

If the tasks cannot be answered using only the information provided in the input, do not make up a response.

Expand Down
2 changes: 1 addition & 1 deletion airoboros/instructors/prompts/cot.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Provide a set of {batch_size} new, complex, unique, diverse tasks.

Be sure to include a wide variety of tasks, including tasks that explore ideas of set theory, information theory, parallelism, logic, extrapolation from scientific studies, etc., but also be sure to only include questions that have multiple potentially correct answers.

All output should have an approximate Flesch reading ease score of {flesch}.
{flesch}

All output text should be in {language}, but the exact term "TSK" is a special token that must not be translated.

Expand Down
2 changes: 1 addition & 1 deletion airoboros/instructors/prompts/experience.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ I'd like you to create an instruction and response pair for an imaginative exper

Select a completely random location for the new setting, and be very creative in the details.

All output should have an approximate Flesch reading ease score of {flesch}.
{flesch}

All output text should be in {language}, but "SETTING" and "EXPERIENCE" are special tokens that must not be translated.

Expand Down
2 changes: 1 addition & 1 deletion airoboros/instructors/prompts/general.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Requirements for the tasks:
- Any instruction referencing a list of objects, such as classifying a list of items, should include the list of items.
{topics}

All output should have an approximate Flesch reading ease score of {flesch}.
{flesch}

Include exactly {batch_size} tasks in your response.

Expand Down
2 changes: 1 addition & 1 deletion airoboros/instructors/prompts/multiple_choice.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Don't start with something like "Certainly, here's a quiz...", just produce the

All output text must be in {language}, but the exact terms "QUESTION" and "ANSWER" are special tokens that must not be translated.

Your output should have an approximate Flesch reading ease score of around {flesch}.
{flesch}

Response format:
QUESTION: question 1, including all options
Expand Down
2 changes: 1 addition & 1 deletion airoboros/instructors/prompts/orca.txt
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ The new questions must not simple formulas, i.e.:

The answers should always have the reasoning first, then the final answer. Don't ever put the final answer first, then reasoning.

All output should have an approximate Flesch reading ease score of {flesch}.
{flesch}

All output text should be in {language}, but the exact terms "QUESTION" and "ANSWER" are special tokens that must not be translated.

Expand Down
3 changes: 2 additions & 1 deletion airoboros/instructors/prompts/roleplay.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ Example 4: What is the meaning of life? Respond using the words/style of Homer

Generate a set of {batch_size} new similar prompts.

Be sure your output would rate with an appropriate Flesch reading ease score for the character/persona requested, or {flesch} if the character is fictional and doesn't have a predefined intelligence/vocabulary/education level.
Be sure your output would rate with an appropriate Flesch reading ease score for the character/persona requested, otherwise:
{flesch}

Be appropriately loquacious for the task, e.g. stories should be long, complex, and detailed, whereas a haiku should be the standard three/5-7-5 format.

Expand Down
4 changes: 3 additions & 1 deletion airoboros/instructors/prompts/writing.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@ Here are a few examples:
- Write an email announcing a new breakthrough in tire technology by your company ("Atobormafi Tire Tech") which increases fuel efficiency by 7% on average. The target of the email is investors, since we are seeking Series B funding. Explain the profit possibilities with such a tire and ask for new investment.
- Write the introduction and methodology sections of a research article in which the use of Psilocybin Cubensis reduced dependency on parmeson cheese by 82%. Dosage was 0.5g/day, split between 3 dosages, administered orally. Double blind, placebo controlled, 1812 participants.

Make sure to include a wide variety of writing tasks, across a wide range of subjects.
Make sure to include a wide variety of writing tasks, across a wide range of subjects. Be loquacious.

{flesch}

{style_extra}

Expand Down
2 changes: 1 addition & 1 deletion airoboros/instructors/prompts/writing_response.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ I would would like you to following the following rules:
- If the instruction doesn't specify any particular style (happy, inspirational, etc.), and the instruction is to write a poem, story, song, or other creative/non-professional output, you will quietly randomly select from one of the following styles: happy, sad, surprising, open-ended, shocking, tragic. Remember that not all stories need a resolution where the downtrodden somehow win - lessons can be learned from stories with bad/sad/tragic endings as well, so don't always select a happy ending. Characters in the story can be friendly, mean, violent, or anything in between - remember, it's just a story, and complex character interactions make it more compelling. Don't include any moralizing or explanation of what the lesson should be, just tell the story - the user is smart and will understand.
- If the instruction is to write an email or letter, do not start the body of the message with "I hope this [..] finds you ...". Use a variety of introductory sentences without requiring some sort of nicety.

All output should have an approximate Flesch reading ease score of {flesch}. Be appropriately loquacious for the task, e.g. stories should be long, complex, and detailed, whereas a haiku should be the standard three/5-7-5 format.
{flesch}

The response should be in {language}.

Expand Down
3 changes: 2 additions & 1 deletion airoboros/self_instruct.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
"gpt-3.5-turbo-16k-0613",
],
}
READABILITY_HINT = "The output should be written in such a way as to have a Flesch-Kincaid readability score of 30 or lower - best understood by those with college education. Only output the story - don't add any notes or information about Flesch-Kincaid scores."


class SelfInstructor:
Expand Down Expand Up @@ -117,7 +118,7 @@ def load_config(self):
if raw_config.get("default_batch_size") is not None:
self.default_batch_size = raw_config["default_batch_size"]
self.language = raw_config.get("language") or "English"
self.default_flesch = int(raw_config.get("default_flesch") or "50")
self.default_flesch = raw_config.get("default_flesch") or READABILITY_HINT

# Validate the model for each generator.
self.instructors = raw_config.get("instructors")
Expand Down
5 changes: 3 additions & 2 deletions example-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ response_filters:
- "violates my"
- "i (can('t| ?not)|w(on't|will not)|am (not |un)able.?).{0,30}(you are|you're|your )"
- "please note that"
- "flesch"

# Optionally limit the maximum number of tokens to use when generating instructions.
max_tokens:
Expand All @@ -63,8 +64,8 @@ default_count: 100
# Default batch size, if not specified.
default_batch_size: 10

# Default readability score where 0 is extremely difficult to read, 100 would be easily understood by 11 year old.
default_flesch: 45
# Default readability score hint: https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests
default_flesch: The output should be written in such a way as to have a Flesch-Kincaid readability score of 30 or lower - best understood by those with college education. The response must not contain any notes or information about Flesch-Kincaid scores.

# Language.
language: English
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

setup(
name="airoboros",
version="2.0.16",
version="2.0.17",
description="Updated and improved implementation of the self-instruct system.",
long_description=long_description,
long_description_content_type="text/markdown",
Expand Down

0 comments on commit 2c6a221

Please sign in to comment.