Skip to content

Latest commit

 

History

History
33 lines (27 loc) · 1.34 KB

File metadata and controls

33 lines (27 loc) · 1.34 KB

Changelog

Development

Added

  • Documentation infrastructure #5

Removed

Changed

v0.1.0 (2024-08-06)

  • Add attention function, model data flow, and olmo sequential block figures. (@Naeemkh, e6e54f0)
  • Add option for nsys profiling (@mbsabath, bf3f2a4)
  • Add table of parameters to the logger (@Naeemkh, 12bf09c)
  • Drop Llama Block (@Naeemkh, 22d0f45)
  • Drop drop out layer (@Naeemkh, 4f2775d and a11ee8a)
  • Add back of the envelop computations (@Naeemkh, 6d83c07)
  • Merge OLMoSequentialBlock into OLMoBlock (@Naeemkh, fff5955)
  • Move flash attention settings to the config file (@Naeemkh, 197c38f)
  • Add sweep generator scripts (@Naeemkh, def2931, e462a92, 1e7fb8c, 7e1c11e)
  • Drop SwiGLU activation function (@Naeemkh, dd12e48, 1d5f0dc, 7c942be)
  • Drop weight_tying (@Naeemkh, 544b0b6)
  • Drop OLMoBlockGroup (@Naeemkh, ceff8f8, ba49aa6 )
  • Keep only PyTorch default LayerNorm (@Naeemkh, beb76cd, d988ea7 )
  • Clean up utility codes for submitting the checkpoints to the cloud (@Naeemkh, f8dbc80)
  • Remove multi-query attention feature and related settings ( @Naeemkh, 74eaf03)
  • Drop effective key value heads and use user requested number of heads ( @Naeemkh, 36f51b7)
  • Fix a bug with setup condo environment (@amazloumi, e51c620, c1f3125 )
  • Drop output multiplier (@Naeemkh, 1b3eb2b)