Skip to content

v0.0.3

Compare
Choose a tag to compare
@dakinggg dakinggg released this 15 Feb 02:45
f46baab

🚀 Examples v0.0.3

Examples v0.0.3 is released! We've been hard at work adding features, fixing bugs, and improving our starter code for training models using MosaicML's stack!

To get started, either clone or fork this repo and install whichever example[s] you're interested in. E.g., to get started training GPT-style Large Language Models, just:

git clone https://github.com/mosaicml/examples.git
cd examples # cd into the repo
pip install -e ".[llm]"  # or pip install -e ".[llm-cpu]" if no NVIDIA GPU
cd examples/llm # cd into the specific example's folder

Available examples include bert, cifar, llm, resnet, deeplab.

New Features

  1. Tooling for computing throughput and Model Flops Utiliziation (MFU) using MosaicML Cloud (#53, #56, #71, #117, #152)

    We've made it easier to benchmark throughput and MFU on our Large Language Model (LLM) stack. The SpeedMonitor has been extended to report MFU. It is on by default for our MosaicGPT examples, and can be easily added to your own code by defining num_fwd_flops for your model and adding the SpeedMonitorMFU callback to the Trainer. See the callback for the details!

    We've also used our MCLI SDK to easily measure throughput and MFU of our LLMs across a range of parameters. The tools and results are in our throughput folder. Stay tuned for an update with the latest numbers!

  2. Upgrade to the latest versions of Composer, Streaming, and Flash Attention (#54, #61, #118, #124)

    We've upgraded all our examples to use the latest versions of Composer, Streaming, and Flash Attention. This means speed improvements, new features, and deterministic, elastic, mid-epoch resumption, thanks to our Streaming library!

  3. The repo is now pip installable from source (#76, #90)

    The repo can now be easily installed for whichever example you are interested in using. For example, to install components for the llm example, navigate to the root and run pip install -e .[llm]. We will be putting the package on PyPi soon!

  4. Support for FSDP wrapping more HuggingFace models (#83, #106)

    We've added support for using FSDP to wrap more types of HuggingFace models like BLOOM and OPT.

  5. In-Context Learning (ICL) evaluation metrics (#116)

    The ICL evaluation tools from Composer 0.12.1 are now available for measuring metrics like LAMBADA, HellaSwag, PIQA, etc. for Causal LMs. See the llm/icl_eval/ folder for templates. These ICL metrics can also be measured live during training with minimal overhead. Please see our blogpost for more details.

  6. Simple BERT finetuning example (#141)

    In addition to our example of finetuning BERT on the full suite of GLUE tasks, we've added an example of finetuning on a single sequence classification dataset. This should be a simple entrypoint to finetuning BERT compared with all the bells and whistles of our GLUE example.

  7. NeMo Megatron example (#84, #138)

    We've added a simple example of how to get started running NeMo Megatron on MCloud!

Deprecations

  1. 🚨 group_method argument for StreamingTextDataset replaced 🚨 (#128)

    In order to support the deterministic shuffle with elastic resumption, we could no longer concatenate text examples on the fly in the dataloader. This means that we have deprecated the group_method argument of StreamingTextDataset. In order to use concatenated text (which is a standard practice for pretraining LLMs), you can use the convert_c4.py script with the --concat_tokens option. This will pretokenize your dataset, and pack sequences together up to the maximum sequence length so that your pretraining examples have no padding. To use the equivalent of the old truncate option, you can use convert_c4.py without the --concat_tokens option, and the dataloader will truncate or pad sequences to the maximum sequence length on the fly.