Machine Learning
A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search scβ¦
Low-code framework for building custom LLMs, neural networks, and other AI models
Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)
BookNLP, a natural language processing pipeline for books
Transform audio-visual content into navigable knowledge.
π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
Generating bash command from natural language https://arxiv.org/abs/1802.08979
The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.
Pretrained language model with 100B parameters
A framework for detecting, highlighting and correcting grammatical errors on natural language text. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.
Full description can be found here: https://discuss.huggingface.co/t/pretrain-gpt-neo-for-open-source-github-copilot-model/7678?u=ncoop57
Models and examples built with TensorFlow
The fastest β‘οΈ way to build data pipelines. Develop iteratively, deploy anywhere. βοΈ
Open source platform for the machine learning lifecycle
Awesome list of open-source startup alternatives to well-known SaaS products π
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable,β¦
Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models π
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
βοΈ Build multimodal AI applications with cloud-native stack
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Hidden Markov Models in Python, with scikit-learn like API
Hummingbird compiles trained ML models into tensor computation for faster inference.
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Command-line tools for speech and intent recognition on Linux
U.S. English voice2json profile based on Pocketsphinx
Model Deployment at Scale on Kubernetes π¦οΈ
Offline private voice assistant for many human languages