Skip to content

Easy-to-use library to boost AI inference leveraging state-of-the-art optimization techniques.

Notifications You must be signed in to change notification settings

valeriosofi/nebullvm

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation







The next-generation platform to monitor and optimize your AI costs in one place

Nebuly is the next-generation platform to monitor and optimize your AI costs in one place. The platform connects to all your AI cost sources (compute, API providers, AI software licenses, etc) and centralizes them in one place to give you full visibility and control. The platform also provides optimization recommendations and a co-pilot model that can guide during the optimization process. The platform builds on top of the open-source tools allowing you to optimize the different steps of your AI stack to squeeze out the best possible cost performances.

If you like the idea, give us a star to show your support for the project ⭐

Apply for enterprise version early access here: https://qpvirevo4tz.typeform.com/to/X7VfuRiH

AI costs monitoring (SDK)

The monitoring platform allows you to monitor 100% of your AI costs. We support 3 main buckets of costs:

  • Infrastructure and compute (AWS, Azure, GCP, on-prem)
  • AI-related software/tools licenses (OpenAI, Cohere, Scale AI, Snorkel, Pinecone, HuggingFace, Databricks, etc)
  • People (Jira, GitLab, Asana, etc)

The easiest way to install the SDK is via pip:

pip install nebuly

The list of the supported integrations will be available soon.

AI cost optimization

Once you have full visibility over your AI costs, you are ready to optimize them. We have developed multiple open-source tools to optimize the cost and improve the performances of your AI systems:

✅ Speedster: reduce inference costs by leveraging SOTA optimization techniques that best couple your AI models with the underlying hardware (GPUs and CPUs)

✅ Nos: reduce infrastructure costs by leveraging real-time dynamic partitioning and elastic quotas to maximize the utilization of your Kubernetes GPU cluster

✅ ChatLLaMA: reduce hardware and data costs by leveraging fine-tuning optimization techniques and RLHF alignment

Contributing

As an open source project in a rapidly evolving field, we welcome contributions of all kinds, including new features, improved infrastructure, and better documentation. If you're interested in contributing, please see the linked page for more information on how to get involved.


Join the community | Contribute to the library

About

Easy-to-use library to boost AI inference leveraging state-of-the-art optimization techniques.

Resources

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 79.2%
  • Jupyter Notebook 16.8%
  • CMake 3.3%
  • Other 0.7%