Improve and rework GPT-tfjs

Here is a list of potential improvements for gpt-tfjs in Disco:
- [x] Create a compile method to initialize the optimizer (rather than initializing it when fitDataset is called). This ensures the optimizer state is persisted across multiple calls to fitDataset
- [x] Implement save and load methods to save and re-use a trained model
- [x] Rename classes for better clarity and consistency, e.g. multiple classes and functions are called `GPT`
- [x] Assess whether we can use tf.CustomCallbackArgs rather than redefining an interface for TrainingCallbacks
- [x] Assess whenever we can use TFJS' native fitDataset method rather than overriding it with a custom training loop
-> tfjs only implements Adam while GPT2 uses AdamW. Additionally, the custom optimizer allows having weight decay which is used in the original GPT2.
- [x] Reading a text file with TF.js only supports reading line by line which is not ideal for LLM inputs, try implementing a file reader chunk by chunk rather than by lines 
- [x] Training with gpt2 has NaN loss after the first epoch step 

#656 and #657 should be addressed first

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve and rework GPT-tfjs #654

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve and rework GPT-tfjs #654

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions