- Documentation infrastructure #5
v0.1.0 (2024-08-06)
- Add attention function, model data flow, and olmo sequential block figures. (@Naeemkh, e6e54f0)
- Add option for nsys profiling (@mbsabath, bf3f2a4)
- Add table of parameters to the logger (@Naeemkh, 12bf09c)
- Drop Llama Block (@Naeemkh, 22d0f45)
- Drop drop out layer (@Naeemkh, 4f2775d and a11ee8a)
- Add back of the envelop computations (@Naeemkh, 6d83c07)
- Merge OLMoSequentialBlock into OLMoBlock (@Naeemkh, fff5955)
- Move flash attention settings to the config file (@Naeemkh, 197c38f)
- Add sweep generator scripts (@Naeemkh, def2931, e462a92, 1e7fb8c, 7e1c11e)
- Drop SwiGLU activation function (@Naeemkh, dd12e48, 1d5f0dc, 7c942be)
- Drop weight_tying (@Naeemkh, 544b0b6)
- Drop OLMoBlockGroup (@Naeemkh, ceff8f8, ba49aa6 )
- Keep only PyTorch default LayerNorm (@Naeemkh, beb76cd, d988ea7 )
- Clean up utility codes for submitting the checkpoints to the cloud (@Naeemkh, f8dbc80)
- Remove multi-query attention feature and related settings ( @Naeemkh, 74eaf03)
- Drop effective key value heads and use user requested number of heads ( @Naeemkh, 36f51b7)
- Fix a bug with setup condo environment (@amazloumi, e51c620, c1f3125 )
- Drop output multiplier (@Naeemkh, 1b3eb2b)