Skip to content

Cerebras/inference-examples

Repository files navigation

Cerebras Inference API Demos

Welcome to the Cerebras Inference API demo repository! This repository contains various examples showcasing the power of the Cerebras Wafer-Scale Engines and CS-3 systems for AI model inference.

🚀 Introduction

The Cerebras API offers developers a low-latency solution for AI model inference powered by Cerebras Wafer-Scale Engines and CS-3 systems. We invite developers to explore the new possibilities that our high-speed inferencing solution unlocks.

Currently, the Cerebras API provides access to two models: Meta’s Llama 3.1 8B and 70B models. Both models are instruction-tuned and can be used for conversational applications.

🧠 Models Available

  • Llama-3.1-8B

    • Parameters: 8 billion
    • Knowledge Cutoff: March 2023
    • Context Length: 8192
    • Training Tokens: 15 trillion
  • Llama-3.1-70B

    • Parameters: 70 billion
    • Knowledge Cutoff: December 2023
    • Context Length: 8192
    • Training Tokens: 15 trillion

📚 Resources

📁 Projects Overview

This repository contains multiple example projects, each demonstrating different capabilities of the Cerebras Inference API. Each project is located in its own folder and contains a detailed README.

🔗 Example Projects


🌟 Getting Started

To explore each project, simply navigate to the corresponding folder and follow the instructions in the README. Happy coding!

🛠️ Requirements

  • Python 3.7+
  • Docker (for RAG examples)
  • Streamlit (for Cerebras + Streamlit example)
  • Other dependencies as noted in each project’s README.

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

👥 Contributors

We welcome contributions! Feel free to submit a pull request or open an issue.


© 2024 Cerebras Systems

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •