Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cohereai_classify table | CohereAI plugin | Steampipe Hub #644

Open
1 task
irthomasthomas opened this issue Feb 28, 2024 · 1 comment
Open
1 task

cohereai_classify table | CohereAI plugin | Steampipe Hub #644

irthomasthomas opened this issue Feb 28, 2024 · 1 comment
Labels
AI-Agents Autonomous AI agents using LLMs Algorithms Sorting, Learning or Classifying. All algorithms go here. Automation Automate the things data-validation Validating data structures and formats Models LLM and ML model repos and links Software2.0 Software development driven by AI and neural networks.

Comments

@irthomasthomas
Copy link
Owner

TITLE: cohereai_classify table | CohereAI plugin | Steampipe Hub

DESCRIPTION:
Overview
8Tables
Versions
GitHub
steampipe plugin install mr-destructive/cohereai

cohereai_classify
cohereai_detect_language
cohereai_detokenize
cohereai_embed
cohereai_generation
cohereai_summaraize
cohereai_summarize
cohereai_tokenize

ON THIS PAGE
Examples
Schema

GET INVOLVED
Edit on GitHub
Discuss on Slack

Table: cohereai_classify

Get classification for a given input strings and examples.

Notes:

  • A inputs is a list of strings to classify.(max 96 strings)
  • A examples is a list of {"text": "apple", "label": "fruit"} structure of type Example
  • Minimum 2 examples should be provided and the maximum value is 2500 with each example of maximum of 512 tokens.

Examples

Basic classification with given set of inputs and examples

select
  classification
from
  cohereai_classify
where
  inputs = '["apple", "blue", "pineapple"]'
  and examples = '[{"text": "apple", "label": "fruit"}, {"text": "green", "label": "color"}, {"text": "grapes", "label": "fruit"}, {"text": "purple", "label": "color"}]';

Classification with specific settings(model, preset)

select
  classification
from
  cohereai_classify
where
  settings = '{
 "model": "embed - multilingual - v2.0" }'
  and inputs = '["Help!", "Call me when you can"]'
  and examples = '[{"text": "Help!", "label": "urgent"}, {"text": "SOS", "label": "urgent"}, {"text": "Call me when you can", "label": "not urgent"}, {"text": "Talk later?", "label": "not urgent"}]';

Email Spam Classification

select
  classification
from
  cohereai_classify
where
  inputs = '["Confirm your email address", "hey i need u to send some $"]'
  and examples = '[{"label": "Spam", "text": "Dermatologists don't like her!"}, {"label": "Spam", "text": "Hello, open to this?"}, {"label": "Spam", "text": "I need help please wire me $1000 right now"}, {"label": "Spam", "text": "Hot new investment, don't miss this!"}, {"label": "Spam", "text": "Nice to know you ;)"}, {"label": "Spam", "text": "Please help me?"}, {"label": "Not spam", "text": "Your parcel will be delivered today"}, {"label": "Not spam", "text": "Review changes to our Terms and Conditions"}, {"label": "Not spam", "text": "Weekly sync notes"}, {"label": "Not spam", "text": "Re: Follow up from today's meeting"}, {"label": "Not spam", "text": "Pre-read for tomorrow"}]';

Schema for cohereai_classify

Name Type Operators Description
_ctx jsonb Steampipe context in JSON form, e.g. connection_name.
classification text The classification results for the given input text(s).
confidence double precision The confidence score of the classification.
examples text The example text classified.
id text The ID of the classification.
inputs text The input text that was classified.
labels jsonb The labels of the classification.
settings jsonb Settings is a JSONB object that accepts any of the classify API request parameters.

URL: cohereai_classify table | CohereAI plugin | Steampipe Hub

Suggested labels

@irthomasthomas irthomasthomas added AI-Agents Autonomous AI agents using LLMs Algorithms Sorting, Learning or Classifying. All algorithms go here. Automation Automate the things data-validation Validating data structures and formats Models LLM and ML model repos and links Software2.0 Software development driven by AI and neural networks. labels Feb 28, 2024
@irthomasthomas
Copy link
Owner Author

Related issues

#640: README.md · defog/sqlcoder-7b-2 at main

### DetailsSimilarity score: 0.89 - [ ] [README.md · defog/sqlcoder-7b-2 at main](https://huggingface.co/defog/sqlcoder-7b-2/blob/main/README.md?code=true)

README.md · defog/sqlcoder-7b-2 at main

DESCRIPTION:

license: cc-by-sa-4.0
library_name: transformers
pipeline_tag: text-generation

Update notice

The model weights were updated at 7 AM UTC on Feb 7, 2024. The new model weights lead to a much more performant model – particularly for joins.

If you downloaded the model before that, please redownload the weights for best performance.

Model Card for SQLCoder-7B-2

A capable large language model for natural language to SQL generation.

image/png

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • Developed by: Defog, Inc
  • Model type: [Text to SQL]
  • License: [CC-by-SA-4.0]
  • Finetuned from model: [CodeLlama-7B]

Model Sources [optional]

Uses

This model is intended to be used by non-technical users to understand data inside their SQL databases. It is meant as an analytics tool, and not as a database admin tool.

This model has not been trained to reject malicious requests from users with write access to databases, and should only be used by users with read-only access.

How to Get Started with the Model

Use the code here to get started with the model.

Prompt

Please use the following prompt for optimal results. Please remember to use do_sample=False and num_beams=4 for optimal results.

### Task
Generate a SQL query to answer [QUESTION]{user_question}[/QUESTION]
### Database Schema
The query will run on a database with the following schema:
{table_metadata_string_DDL_statements}
### Answer
Given the database schema, here is the SQL query that [QUESTION]{user_question}[/QUESTION]
[SQL]

Evaluation

This model was evaluated on SQL-Eval, a PostgreSQL based evaluation framework developed by Defog for testing and alignment of model capabilities.

You can read more about the methodology behind SQLEval here.

Results

We classified each generated question into one of 6 categories. The table displays the percentage of questions answered correctly by each model, broken down by category.

date group_by order_by ratio join where
sqlcoder-70b 96 91.4 97.1 85.7 97.1 91.4
sqlcoder-7b-2 96 91.4 94.3 91.4 94.3 77.1
sqlcoder-34b 80 94.3 85.7 77.1 85.7 80
gpt-4 72 94.3 97.1 80 91.4 80
gpt-4-turbo 76 91.4 91.4 62.8 88.6 77.1
natural-sql-7b 56 88.6 85.7 60 88.6 80
sqlcoder-7b 64 82.9 74.3 54.3 74.3 74.3
gpt-3.5 72 77.1 82.8 34.3 65.7 71.4
claude-2 52 71.4 74.3 57.1 65.7 62.9

Model Card Contact

Contact us on X at @defogdata, or on email at founders@defog.ai

URL: https://huggingface.co/defog/sqlcoder-7b-2/blob/main/README.md?code=true

Suggested labels

#160: sid321axn/tinyllama-text2sql-finetuned at main

### DetailsSimilarity score: 0.88 ## tiny-llama-text2sql ## safetensors - [ ] [sid321axn/tinyllama-text2sql-finetuned at main](https://huggingface.co/sid321axn/tinyllama-text2sql-finetuned/tree/main)

adapter

https://huggingface.co/sid321axn/tiny-llama-text2sql

This model is a fine-tuned version of PY007/TinyLlama-1.1B-Chat-v0.3 on the None dataset.

{
 "_name_or_path": "PY007/TinyLlama-1.1B-Chat-v0.3",
 "architectures": [
   "LlamaForCausalLM"
 ],
 "attention_bias": false,
 "attention_dropout": 0.0,
 "bos_token_id": 1,
 "eos_token_id": 2,
 "hidden_act": "silu",
 "hidden_size": 2048,
 "initializer_range": 0.02,
 "intermediate_size": 5632,
 "max_position_embeddings": 2048,
 "model_type": "llama",
 "num_attention_heads": 32,
 "num_hidden_layers": 22,
 "num_key_value_heads": 4,
 "pretraining_tp": 1,
 "rms_norm_eps": 1e-05,
 "rope_scaling": null,
 "rope_theta": 10000.0,
 "tie_word_embeddings": false,
 "torch_dtype": "float16",
 "transformers_version": "4.37.0.dev0",
 "use_cache": false,
 "vocab_size": 32003
}
```</details>


### #625: unsloth/README.md at main · unslothai/unsloth
<details><summary>### Details</summary>Similarity score: 0.88
- [ ] [unsloth/README.md at main · unslothai/unsloth](https://github.com/unslothai/unsloth/blob/main/README.md?plain=1)

# unsloth/README.md at main · unslothai/unsloth

<div align="center">

 <a href="https://unsloth.ai"><picture>
   <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20logo%20white%20text.png">
   <source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20logo%20black%20text.png">
   <img alt="unsloth logo" src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20logo%20black%20text.png" height="110" style="max-width: 100%;">
 </picture></a>
 
<a href="https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing"><img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/start free finetune button.png" height="48"></a>
<a href="https://discord.gg/u54VK8m8tk"><img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/Discord button.png" height="48"></a>
<a href="https://ko-fi.com/unsloth"><img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/buy me a coffee button.png" height="48"></a>

### Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory!

![](https://i.ibb.co/sJ7RhGG/image-41.png)

</div>

## ✨ Finetune for Free

All notebooks are **beginner friendly**! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.

| Unsloth supports          |    Free Notebooks                                                                                           | Performance | Memory use |
|-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|
| **Gemma 7b**      | [▶️ Start on Colab](https://colab.research.google.com/drive/10NbwlsRChbma1v55m8LAPYG15uQv6HLo?usp=sharing)               | 2.4x faster | 58% less |
| **Mistral 7b**    | [▶️ Start on Colab](https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing)               | 2.2x faster | 62% less |
| **Llama-2 7b**      | [▶️ Start on Colab](https://colab.research.google.com/drive/1lBzz5KeZJKXjvivbYvmGarix9Ao6Wxe5?usp=sharing)               | 2.2x faster | 43% less |
| **TinyLlama**  | [▶️ Start on Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing)              | 3.9x faster | 74% less |
| **CodeLlama 34b** A100   | [▶️ Start on Colab](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing)              | 1.9x faster | 27% less |
| **Mistral 7b** 1xT4  | [▶️ Start on Kaggle](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook) | 5x faster\* | 62% less |
| **DPO - Zephyr**     | [▶️ Start on Colab](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing)               | 1.9x faster | 19% less |

- This [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing) is useful for ShareGPT ChatML / Vicuna templates.
- This [text completion notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing) is for raw text. This [DPO notebook](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) replicates Zephyr.
- \* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.

## 🦥 Unsloth.ai News
- 📣 [Gemma 7b](https://colab.research.google.com/drive/10NbwlsRChbma1v55m8LAPYG15uQv6HLo?usp=sharing) on 6T tokens now works. And [Gemma 2b notebook](https://colab.research.google.com/drive/15gGm7x_jTm017_Ic8e317tdIpDG53Mtu?usp=sharing)
- 📣 Added [conversational notebooks](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing) and [raw text notebooks](https://colab.research.google.com/drive/1bMOKOBzxQWUIGZBs_B0zm8pimuEnZdfM?usp=sharing)
- 📣 [2x faster inference](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) added for all our models
- 📣 [DPO support](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) is now included. [More info](#DPO) on DPO
- 📣 We did a [blog](https://huggingface.co/blog/unsloth-trl) with 🤗Hugging Face and are in their official docs! Check out the [SFT docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth) and [DPO docs](https://huggingface.co/docs/trl/main/en/dpo_trainer#accelerate-dpo-fine-tuning-using-unsloth)
- 📣 [Download models 4x faster](https://huggingface.co/collections/unsloth/)  from 🤗Hugging Face. Eg: `unsloth/mistral-7b-bnb-4bit`

## 🔗 Links and Resources
| Type                            | Links                               |
| ------------------------------- | --------------------------------------- |
| 📚 **Wiki & FAQ**              | [Read Our Wiki](https://github.com/unslothai/unsloth/wiki) |
| 📜 **Documentation**              | [Read The Doc](https://github.com/unslothai/unsloth/tree/main#-documentation) |
| 💾 **Installation**               | [unsloth/README.md](https://github.com/unslothai/unsloth/tree/main#installation-instructions)|
| <img height="14" src="https://upload.wikimedia.org/wikipedia/commons/6/6f/Logo_of_Twitter.svg" />&nbsp; **Twitter (aka X)**              |  [Follow us on X](https://twitter.com/unslothai)|
| 🥇 **Benchmarking**                   | [Performance Tables](https://github.com/unslothai/unsloth/tree/main#-performance-benchmarking)
| 🌐 **Released Models**            | [Unsloth Releases](https://huggingface.co/unsloth)|
| ✍️ **Blog**                    | [Read our Blogs](https://unsloth.ai/blog)|

## ⭐ Key Features
- All kernels written in [OpenAI's Triton](https://openai.com/research/triton) language. **Manual backprop engine**.
- **0% loss in accuracy** - no approximation methods - all exact.
- No change of hardware. Supports NVIDIA GPUs since 2018+. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc) [Check your GPU!](https://developer.nvidia.com/cuda-gpus) GTX 1070, 1080 works, but is slow.
- Works on **Linux** and **Windows** via WSL.
- Supports 4bit and 16bit QLoRA / LoRA finetuning via [bitsandbytes](https://github.com/TimDettmers/bitsandbytes).
- Open source trains 5x faster - see [Unsloth Pro](https://unsloth.ai/) for **30x faster training**!
- If you trained a model with 🦥Unsloth, you can use this cool sticker! &nbsp; <img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/made with unsloth.png" height="50" align="center" />


## 🥇 Performance Benchmarking
- For the full list of **reproducable** benchmarking tables, [go to our website](https://unsloth.ai/blog/mistral-benchmark#Benchmark%20tables)

| 1 A100 40GB  | 🤗Hugging Face | Flash Attention | 🦥Unsloth Open Source | 🦥[Unsloth Pro](https://unsloth.ai/pricing) |
|--------------|--------------|-----------------|---------------------|-----------------|
| Alpaca       | 1x           | 1.04x           | 1.98x               | **15.64x**      |
| LAION Chip2  | 1x           | 0.92x           | 1.61x               | **20.73x**      |
| OASST        | 1x           | 1.19x           | 2.17x               | **14.83x**      |
| Slim Orca    | 1x           | 1.18x           | 2.22x               | **14.82x**      |

- Benchmarking table below was conducted by [🤗Hugging Face](https://huggingface.co/blog/unsloth-trl).

| Free Colab T4 | Dataset | 🤗Hugging Face | Pytorch 2.1.1 | 🦥Unsloth | 🦥 VRAM reduction |
| --- | --- | --- | --- | --- | --- |
| Llama-2 7b | OASST | 1x | 1.19x | 1.95x | -43.3% |
| Mistral 7b | Alpaca | 1x | 1.07x | 1.56x | -13.7% |
| Tiny Llama 1.1b | Alpaca | 1x | 2.06x | 3.87x | -73.8% |
| DPO with Zephyr | Ultra Chat | 1x | 1.09x | 1.55x | -18.6% |

![](https://i.ibb.co/sJ7RhGG/image-41.png)

[View on GitHub](https://github.com/unslothai/unsloth/blob/main/README.md?plain=1)

#### Suggested labels
#### </details>


### #386: SciPhi/AgentSearch-V1 · Datasets at Hugging Face
<details><summary>### Details</summary>Similarity score: 0.87
- [ ] [SciPhi/AgentSearch-V1 · Datasets at Hugging Face](https://huggingface.co/datasets/SciPhi/AgentSearch-V1)

#### Getting Started

The AgentSearch-V1 dataset is a comprehensive collection of over one billion embeddings, produced using jina-v2-base. It includes more than 50 million high-quality documents and over 1 billion passages, covering a vast range of content from sources such as Arxiv, Wikipedia, Project Gutenberg, and includes carefully filtered Creative Commons (CC) data. Our team is dedicated to continuously expanding and enhancing this corpus to improve the search experience. We welcome your thoughts and suggestions – please feel free to reach out with your ideas!

To access and utilize the AgentSearch-V1 dataset, you can stream it via HuggingFace with the following Python code:
```python
from datasets import load_dataset
import json
import numpy as np

# To stream the entire dataset:
ds = load_dataset("SciPhi/AgentSearch-V1", data_files="**/*", split="train", streaming=True)

# Optional, stream just the "arxiv" dataset
# ds = load_dataset("SciPhi/AgentSearch-V1", data_files="**/*", split="train", data_files="arxiv/*", streaming=True)

# To process the entries:
for entry in ds:
   embeddings = np.frombuffer(
       entry['embeddings'], dtype=np.float32
   ).reshape(-1, 768)
   text_chunks = json.loads(entry['text_chunks'])
   metadata = json.loads(entry['metadata'])
   print(f'Embeddings:\n{embeddings}\n\nChunks:\n{text_chunks}\n\nMetadata:\n{metadata}')
   break

A full set of scripts to recreate the dataset from scratch can be found here. Further, you may check the docs for details on how to perform RAG over AgentSearch.

Languages

English.

Dataset Structure

The raw dataset structure is as follows:

{
    "url": ...,
    "title": ...,
    "metadata": {"url": "...", "timestamp": "...", "source": "...", "language": "..."},
    "text_chunks": ...,
    "embeddings": ...,
    "dataset": "book" | "arxiv" | "wikipedia" | "stack-exchange" | "open-math" | "RedPajama-Data-V2"
}

Dataset Creation

This dataset was created as a step towards making humanities most important knowledge openly searchable and LLM optimal. It was created by filtering, cleaning, and augmenting locally publicly available datasets.

To cite our work, please use the following:

@software{SciPhi2023AgentSearch,
author = {SciPhi},
title = {AgentSearch [ΨΦ]: A Comprehensive Agent-First Framework and Dataset for Webscale Search},
year = {2023},
url = {https://github.com/SciPhi-AI/agent-search}
}

Source Data

@online{wikidump,
author = "Wikimedia Foundation",
title = "Wikimedia Downloads",
url = "https://dumps.wikimedia.org"
}

@misc{paster2023openwebmath,
title={OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text},
author={Keiran Paster and Marco Dos Santos and Zhangir Azerbayev and Jimmy Ba},
year={2023},
eprint={2310.06786},
archivePrefix={arXiv},
primaryClass={cs.AI}
}

@software{together2023redpajama,
author = {Together Computer},
title = {RedPajama: An Open Source Recipe to Reproduce LLaMA training dataset},
month = April,
year = 2023,
url = {https://github.com/togethercomputer/RedPajama-Data}
}

License

Please refer to the licenses of the data subsets you use.

  • Open-Web (Common Crawl Foundation Terms of Use)
  • Books: the_pile_books3 license and pg19 license
  • ArXiv Terms of Use
  • Wikipedia License
  • StackExchange license on the Internet Archive

Suggested labels

{ "key": "knowledge-dataset", "value": "A dataset with one billion embeddings from various sources, such as Arxiv, Wikipedia, Project Gutenberg, and carefully filtered Creative Commons data" }

#396: astra-assistants-api: A backend implementation of the OpenAI beta Assistants API

### DetailsSimilarity score: 0.86 - [ ] [datastax/astra-assistants-api: A backend implementation of the OpenAI beta Assistants API](https://github.com/datastax/astra-assistants-api)

Astra Assistant API Service

A drop-in compatible service for the OpenAI beta Assistants API with support for persistent threads, files, assistants, messages, retrieval, function calling and more using AstraDB (DataStax's db as a service offering powered by Apache Cassandra and jvector).

Compatible with existing OpenAI apps via the OpenAI SDKs by changing a single line of code.

Getting Started

  1. Create an Astra DB Vector database
  2. Replace the following code:
client = OpenAI(
    api_key=OPENAI_API_KEY,
)

with:

client = OpenAI(
    base_url="https://open-assistant-ai.astra.datastax.com/v1", 
    api_key=OPENAI_API_KEY,
    default_headers={
        "astra-api-token": ASTRA_DB_APPLICATION_TOKEN,
    }
)

Or, if you have an existing astra db, you can pass your db_id in a second header:

client = OpenAI(
    base_url="https://open-assistant-ai.astra.datastax.com/v1", 
    api_key=OPENAI_API_KEY,
    default_headers={
        "astra-api-token": ASTRA_DB_APPLICATION_TOKEN,
        "astra-db-id": ASTRA_DB_ID
    }
)
  1. Create an assistant
assistant = client.beta.assistants.create(
  instructions="You are a personal math tutor. When asked a math question, write and run code to answer the question.",
  model="gpt-4-1106-preview",
  tools=[{"type": "retrieval"}]
)

By default, the service uses AstraDB as the database/vector store and OpenAI for embeddings and chat completion.

Third party LLM Support

We now support many third party models for both embeddings and completion thanks to litellm. Pass the api key of your service using api-key and embedding-model headers.

For AWS Bedrock, you can pass additional custom headers:

client = OpenAI(
    base_url="https://open-assistant-ai.astra.datastax.com/v1", 
    api_key="NONE",
    default_headers={
        "astra-api-token": ASTRA_DB_APPLICATION_TOKEN,
        "embedding-model": "amazon.titan-embed-text-v1",
        "LLM-PARAM-aws-access-key-id": BEDROCK_AWS_ACCESS_KEY_ID,
        "LLM-PARAM-aws-secret-access-key": BEDROCK_AWS_SECRET_ACCESS_KEY,
        "LLM-PARAM-aws-region-name": BEDROCK_AWS_REGION,
    }
)

and again, specify the custom model for the assistant.

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model="meta.llama2-13b-chat-v1",
)

Additional examples including third party LLMs (bedrock, cohere, perplexity, etc.) can be found under examples.

To run the examples using poetry:

  1. Create a .env file in this directory with your secrets.
  2. Run:
poetry install
poetry run python examples/completion/basic.py
poetry run python examples/retreival/basic.py
poetry run python examples/function-calling/basic.py

Coverage

See our coverage report here.

Roadmap

  • Support for other embedding models and LLMs
  • Function calling
  • Pluggable RAG strategies
  • Streaming support

Suggested labels

{ "key": "llm-function-calling", "value": "Integration of function calling with Large Language Models (LLMs)" }

#626: classifiers/README.md at main · blockentropy/classifiers

### DetailsSimilarity score: 0.85 - [ ] [classifiers/README.md at main · blockentropy/classifiers](https://github.com/blockentropy/classifiers/blob/main/README.md?plain=1)

classifiers/README.md

Fast Classifiers for Prompt Routing

Routing and controlling the information flow is a core component in optimizing machine learning tasks. While some architectures focus on internal routing of data within a model, we focus on the external routing of data between models. This enables the combination of open source, proprietary, API based, and software based approaches to work together behind a smart router. We investigate three different ways of externally routing the prompt - cosine similarity via embeddings, zero-shot classification, and small classifiers.

Implementation of Fast Classifiers

The code-class.ipynb Jupyter notebook walks through the process of creating a fast prompt classifier for smart routing. For the fast classifiers, we utilize the model DistilBERT, a smaller language representation model designed for efficient on-the-edge operation and training under computational constraints. DistilBERT is not only less costly to pre-train but also well-suited for on-device computations, as demonstrated through experiments and comparative studies.

We quantize the model using Optimum, enabling the model to run extremely fast on a CPU router. Each classifier takes 5-8ms to run. An ensemble of 8 prompt classifiers takes about 50ms in total. Thus, each endpoint can route about 20 requests per second.

In the example code-class, we are deciding between prompts of code and not code prompts. The two datasets used are the 52K instruction-following data generated by GPT-4 with prompts in Alpaca. And the 20K instruction-following data used for fine-tuning the Code Alpaca model.

Train test split of 80/20 yields an accuracy of 95.49% and f1 score of 0.9227.
Train Test

Comparison vs other Routing methods

The most popular alternative to routing is via embedding similarity. For example, if one were to try to route a programming question, one might set up the set of target classes as ["coding", "not coding"]. Each one of these strings is then transformed into an embedding and compared against a prompt query like, "write a bubble sort in python". Given the computed pair-wise cosine similarity between the query and class, we can then label the prompt as a coding question and route the prompt to a coding-specific model. These do not scale well with larger numbers of embeddings. Nor are they able to capture non-semantic type classes (like is the response likely to be more or less than 200 tokens). However, they are adaptable and comparably fast and thus provide a good alternative to the trained fast classifiers.

Train Test

Quantifying different methods of routing in terms of execution time. As the prompt size increases, the query time also increases as shown in (a). There is also a close to linear increase in the time as the number of classes increase as shown in (b). However, the small classifiers do not increase in time as the class examples increase in the number of tokens (c). This is due to the upfront cost of training the binary classifier, reducing cost at inference.

Reproducibility

The timing_tests.js and complexity.js files can be used for reproducibility. Note that only the code classifier is currently available in this repo. One will need to install the appropriate models from the Transformers.js repo.

View on GitHub

Suggested labels

{'label-name': 'Prompt-Routing', 'label-description': 'Focuses on external routing of data between models to optimize machine learning tasks.', 'confidence': 50.24}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI-Agents Autonomous AI agents using LLMs Algorithms Sorting, Learning or Classifying. All algorithms go here. Automation Automate the things data-validation Validating data structures and formats Models LLM and ML model repos and links Software2.0 Software development driven by AI and neural networks.
Projects
None yet
Development

No branches or pull requests

1 participant