Read This Before Editing Data

This file contains specifics for organizing and editing data. Please read it carefully as this is very important for creating good models.

Data Guidelines

YOU MUST LOOK THROUGH EVERY SENTENCE PUT INTO THE MODEL. Don't just generate 150 sentences and call it good without checking them for the below properties.
If you put data into one intent that could also go into another one, you must instead create a new, specific label that quantifies the intent. For example, prompted-name sentences can go into all order and reservation intents, therefore we creates the add-info intent.
Do not input wacky sentences generated by a LLM that nobody will ever say, the odd language and extra words can fuck up the model.
- "yo dont trip fam lemme get a table for this wily crew" - Claude (yes it actually wrote that)
However, broken English sentences are very good for training. These tend to not fuck with the model and help it to understand the general notion of the intent.
- "i need table for tonight"

Organization

I am going to go through each folder and its purpose. The first 3 are unique, after that they are sentence data points grouped by intent.

processing

This folder stores all Python files used for processing the data.
processing.py contains functions to write out data into the necessary format for both intent and NER models
data_helper_functions.py contains every other function we use to process data.
- For example, deleting duplicates, or removing any line with a specific phrase.
- If you ever need something like this done, please check for the necessary function within the file first. If it is not there, create your own function and provide an explanation of it for others to use.

Filler-Data

Hold files with filler data for dynamic sentences, our processing files will automatically insert random lines from these files into empty labels in dynamic sentences.

Final-Datasets

Stores the final .json output for each new model

Out_of_Scope

Out of Scope intent sentences
data_full.json is a file of sentences I found online, the out_of_scope_processing.py file removes any intents in that could interfere with other files and prints the necessary sentences into a .txt
If anything is classified as out of scope incorrectly, check this file for conflicting sentences.

Add-Info

Prompted inputs such as name, date, time, etc...
Allows these inputs to contribute to multiple intent pipelines

Change-Info

If a user wants to change the name on their order or reservation, it needs to apply to both pipelines. Since we don't have multi-intent, a new label is necessary.

Confirm-Deny and Greeting-Farewell

Pretty self-explanatory

Inquiry, Order, and Reservation

Groupings of intent data

Compiling Data For a New Model

Creating and organizing data is important, but if you mess up compiling it, none of that matters. Pay very close attention to what you are doing, be sure to remember the data guidelines outlined in the initial section of this file.
Find all data from separated files throughout the specific folders and the model type folder, bring all the data into Final.txt, and send it through processing.py to convert it into a usable format. You can store both the Final.txt and outputted useable data in the Final-Datasets folder of the model type.

That's the end for now. If you have any questions please bring it up to me(Travis) as soon as possible and I can amend the document.

Again, this is very important. Unorganized data will lead to more sloppy models, wasted time, and wasted money.

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
Filler_Data		Filler_Data
Inquiry		Inquiry
Misc		Misc
Models		Models
Order		Order
Reservation		Reservation
processing		processing
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Read This Before Editing Data

Data Guidelines

Organization

Compiling Data For a New Model

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

FalsettoAI/Hydra

Folders and files

Latest commit

History

Repository files navigation

Read This Before Editing Data

Data Guidelines

Organization

Compiling Data For a New Model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages