-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Initial commit #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| ## License | ||
|
|
||
| Lit-LLaMA is released under the [Apache 2.0](https://github.com/Lightning-AI/lightning-llama/blob/main/LICENSE) license. | ||
| # FIXME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to refresh this
| > **Note** | ||
| > All scripts support argument [customization](customize_paths.md) | ||
| ### FIXME: update this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to try this on a A100
lantiga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work @carmocca!
|
|
||
| if hasattr(self, "bias"): | ||
| # causal self-attention; Self-attend: (B, nh, T, hs) x (B, nh, hs, T) -> (B, nh, T, T) | ||
| # NOTE: cannot use flash attention because it takes q.size(-1) as the norm factor which is different to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this conditioned on bias being there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #2
|
|
||
| class Tokenizer: | ||
| def __init__(self, vocabulary_path: Path, config_path: Path) -> None: | ||
| # https://github.com/Stability-AI/StableLM/blob/e60081/configs/stablelm-base-alpha-3b.yaml#L108 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just vendor the yaml file in the repo directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting this as a showcase of the configs used?
Because this is a gpt-neox config, meaning we don't need to use it
Or do you want to add support for running the scripts by passing it?
|
There's a test failing in windows and the readme to complete. I can work on the readme. |
Co-authored-by: Luca Antiga <luca@lightning.ai>
* Make trainer configurable and add docker file * Fix bugs * Add dockerignore * Fix config * Fix bug * Fix big * Fix bug * Fix bug * Try dlprof * fix bug * Add pytorch logger * FIx import * Add pytorch profiler * Fix bug * Reorder docker file * Fix bug * Make pytorch profiler optional * Try to fix profiler * Pytorch profiler working. Shunt torch comms again * tune profiler params * Make pt profiler configurable and run for global batches * Fix bug * Fix batch offset * Fix bug * Debug print issues * More print stuff * Add nvtx ranges * Adjust model sizes * tune validation iters
Generation works.
I removed all files that are not updated for simplicity. We can port them from upstream on demand.