Skip to content
@EmbeddedLLM

EmbeddedLLM

EmbeddedLLM is the creator behind JamAI Base, a platform designed to orchestrate AI with spreadsheet-like simplicity.

Pinned Loading

  1. JamAIBase JamAIBase Public

    The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate…

    Python 310 17

  2. vllm vllm Public

    Forked from vllm-project/vllm

    vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs

    Python 87 5

  3. embeddedllm embeddedllm Public

    EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU

    Python 19

Repositories

Showing 10 of 32 repositories
  • vllm Public Forked from vllm-project/vllm

    vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs

    EmbeddedLLM/vllm’s past year of commit activity
    Python 87 Apache-2.0 4,509 6 0 Updated Nov 2, 2024
  • Liger-Kernel Public Forked from linkedin/Liger-Kernel

    Efficient Triton Kernels for LLM Training

    EmbeddedLLM/Liger-Kernel’s past year of commit activity
    Python 0 BSD-2-Clause 194 0 0 Updated Nov 2, 2024
  • infinity-executable Public Forked from michaelfeil/infinity

    Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.

    EmbeddedLLM/infinity-executable’s past year of commit activity
    Python 0 MIT 113 0 0 Updated Oct 30, 2024
  • flash-attention-docker Public

    This is a repository that contains a CI/CD that will try to compile docker images that already built flash attention into the image to facilitate quicker development and deployment of other frameworks.

    EmbeddedLLM/flash-attention-docker’s past year of commit activity
    Shell 0 Apache-2.0 0 0 0 Updated Oct 26, 2024
  • flash-attention-rocm Public Forked from ROCm/flash-attention

    ROCm Fork of Fast and memory-efficient exact attention (The idea of this branch is to hope to generate flash attention pypi package to be readily installed and used.

    EmbeddedLLM/flash-attention-rocm’s past year of commit activity
    Python 0 BSD-3-Clause 1,312 0 0 Updated Oct 26, 2024
  • vllm-rocmfork Public Forked from ROCm/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    EmbeddedLLM/vllm-rocmfork’s past year of commit activity
    Python 0 Apache-2.0 4,509 0 0 Updated Oct 23, 2024
  • etalon Public Forked from project-etalon/etalon

    LLM Serving Performance Evaluation Harness

    EmbeddedLLM/etalon’s past year of commit activity
    Python 0 Apache-2.0 4 0 0 Updated Oct 17, 2024
  • unstructured-python-client Public Forked from Unstructured-IO/unstructured-python-client

    A Python client for the Unstructured hosted API

    EmbeddedLLM/unstructured-python-client’s past year of commit activity
    Python 0 MIT 16 0 1 Updated Oct 14, 2024
  • embeddedllm Public

    EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU

    EmbeddedLLM/embeddedllm’s past year of commit activity
    Python 19 0 6 2 Updated Oct 6, 2024
  • EmbeddedLLM/github-bot’s past year of commit activity
    Go 0 1 0 0 Updated Sep 26, 2024