The instruction following is one of the cornerstones of the current generation of large language models(LLMs). Reinforcement learning with human preferences (RLHF) and techniques such as InstructGPT has been the core foundation of breakthroughs such as ChatGPT and GPT-4. However, these powerful models remain hidden behind APIs and we know very little about their underlying architecture. Instruction-following models are capable of generating text in response to prompts and are often used for tasks like writing assistance, chatbots, and content generation. Many users now interact with these models regularly and even use them for work but the majority of such models remain closed-source and require massive amounts of computational resources to experiment with.
Dolly 2.0 is the first open-source, instruction-following LLM fine-tuned by Databricks on a transparent and freely available dataset that is also open-sourced to use for commercial purposes. That means Dolly 2.0 is available for commercial applications without the need to pay for API access or share data with third parties. Dolly 2.0 exhibits similar characteristics so ChatGPT despite being much smaller.
In this tutorial, we consider how to run an instruction-following text generation pipeline using Dolly 2.0 and OpenVINO. We will use a pre-trained model from the Hugging Face Transformers library. To simplify the user experience, the Hugging Face Optimum library is used to convert the models to OpenVINO™ IR format.
The notebook provides a simple interface that allows communication with a model using text instruction. In this demonstration user can provide input instructions and the model generates an answer in streaming format.
The image below illustrates provided user instruction and model answer example:
The tutorial consists of the following steps:
- Install prerequisites
- Download and convert the model from a public source using the OpenVINO integration with Hugging Face Optimum.
- Compress model weights to INT8 with OpenVINO NNCF
- Create an instruction-following inference pipeline
- Run instruction-following pipeline
This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to Installation Guide.