README.md · defog/sqlcoder-7b-2 at main #640

irthomasthomas · 2024-02-27T20:05:27Z

README.md · defog/sqlcoder-7b-2 at main

README.md · defog/sqlcoder-7b-2 at main

DESCRIPTION:

license: cc-by-sa-4.0
library_name: transformers
pipeline_tag: text-generation

Update notice

The model weights were updated at 7 AM UTC on Feb 7, 2024. The new model weights lead to a much more performant model – particularly for joins.

If you downloaded the model before that, please redownload the weights for best performance.

Model Card for SQLCoder-7B-2

A capable large language model for natural language to SQL generation.

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: Defog, Inc
Model type: [Text to SQL]
License: [CC-by-SA-4.0]
Finetuned from model: [CodeLlama-7B]

Model Sources [optional]

Uses

This model is intended to be used by non-technical users to understand data inside their SQL databases. It is meant as an analytics tool, and not as a database admin tool.

This model has not been trained to reject malicious requests from users with write access to databases, and should only be used by users with read-only access.

How to Get Started with the Model

Use the code here to get started with the model.

Prompt

Please use the following prompt for optimal results. Please remember to use do_sample=False and num_beams=4 for optimal results.

### Task
Generate a SQL query to answer [QUESTION]{user_question}[/QUESTION]
### Database Schema
The query will run on a database with the following schema:
{table_metadata_string_DDL_statements}
### Answer
Given the database schema, here is the SQL query that [QUESTION]{user_question}[/QUESTION]
[SQL]

Evaluation

This model was evaluated on SQL-Eval, a PostgreSQL based evaluation framework developed by Defog for testing and alignment of model capabilities.

You can read more about the methodology behind SQLEval here.

Results

We classified each generated question into one of 6 categories. The table displays the percentage of questions answered correctly by each model, broken down by category.

	date	group_by	order_by	ratio	join	where
sqlcoder-70b	96	91.4	97.1	85.7	97.1	91.4
sqlcoder-7b-2	96	91.4	94.3	91.4	94.3	77.1
sqlcoder-34b	80	94.3	85.7	77.1	85.7	80
gpt-4	72	94.3	97.1	80	91.4	80
gpt-4-turbo	76	91.4	91.4	62.8	88.6	77.1
natural-sql-7b	56	88.6	85.7	60	88.6	80
sqlcoder-7b	64	82.9	74.3	54.3	74.3	74.3
gpt-3.5	72	77.1	82.8	34.3	65.7	71.4
claude-2	52	71.4	74.3	57.1	65.7	62.9

Model Card Contact

Contact us on X at @defogdata, or on email at founders@defog.ai

URL: https://huggingface.co/defog/sqlcoder-7b-2/blob/main/README.md?code=true

Suggested labels

The text was updated successfully, but these errors were encountered:

irthomasthomas · 2024-02-27T20:05:29Z

Related issues

#456: Baseline benchmark for 17 coding models : r/LocalLLaMA

### Details

Similarity score: 0.89 - [ ] [Baseline benchmark for 17 coding models : r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/comments/19fc4uf/baseline_benchmark_for_17_coding_models/)

Baseline Benchmark for 17 Coding Models

Discussion

I am currently working on implementing some ideas for coding models inference strategies (prompting, control, context exploration, CoT, ToT, etc) and I needed a baseline benchmark on a bunch of models. Since I work on a 3060 12GB, I was limited in what I can test so I went for every model that is 7/13B and has an AWQ quant available, since that is what the inference library that I use supports. I thought I'd share some numbers.

Notes:

This is a benchmark for getting a local baseline. I'm interested in improvement from here, so the absolute values are less important for me. Don't take the absolute values too seriously. (well, maybe except deepseek-coder-1.3b, that is a bit suspect).
I used the HumanEval dataset. This is superseded by HumanEval+ and other more recent benchmarks. I chose this because it was the first one I tried. Again, with my tests I'm looking for improvements over the baseline, so this is mostly fine.
AWQ quant is not the best out there, but all my tests will be done with this quant, so for me it is OK.
Temp tests were done in only one generation. In general you'd want to average the score over many generations at a given temp.
Each model was prompted according to the model card template. Here's an example for the codellama series -

f"""<s>You are a helpful and respectful assistant. Answer the following question: {question}"""

Results

I've plotted the results (with horrendous contrasting colors, but alas) to look for any interesting patterns in problem solving. You can find the plots here.

Model	Temp	Correct / 164	Percentage
TheBloke/Mistral-7B-Instruct-v0.2-AWQ	0.0	67	0.40853658536585363
TheBloke/Mistral-7B-Instruct-v0.2-AWQ	0.1	63	0.38414634146341464
TheBloke/Mistral-7B-Instruct-v0.2-AWQ	0.2	68	0.4146341463414634
TheBloke/Mistral-7B-Instruct-v0.2-AWQ	0.3	61	0.3719512195121951
TheBloke/Mistral-7B-Instruct-v0.2-AWQ	0.4	61	0.3719512195121951
TheBloke/Mistral-7B-Instruct-v0.2-AWQ	0.5	63	0.38414634146341464
TheBloke/Mistral-7B-Instruct-v0.2-AWQ	0.6	54	0.32926829268292684
TheBloke/Mistral-7B-Instruct-v0.2-AWQ	0.7	61	0.3719512195121951
TheBloke/Mistral-7B-Instruct-v0.2-AWQ	0.8	60	0.36585365853658536
TheBloke/Mistral-7B-Instruct-v0.2-AWQ	0.9	59	0.3597560975609756
TheBloke/Mistral-7B-Instruct-v0.2-AWQ	1.0	65	0.39634146341463417

Suggested labels

{ "label-name": "coding-models", "description": "Discussion and benchmark of coding models implementation strategies.", "confidence": 96.82 }

#498: CodeGPTPlus/deepseek-coder-1.3b-typescript · Hugging Face

### Details

Similarity score: 0.89 - [ ] [CodeGPTPlus/deepseek-coder-1.3b-typescript · Hugging Face](https://huggingface.co/CodeGPTPlus/deepseek-coder-1.3b-typescript)

CodeGPTPlus/deepseek-coder-1.3b-typescript

This is a fine-tuned model by the CodeGPT team, specifically crafted for generating expert code in TypeScript. It is fine-tuned from deepseek-ai/deepseek-coder-1.3b-base with a dataset of 0.5B tokens, making it an excellent choice for precise and efficient TypeScript code generation.

The model uses a 16K window size and an additional fill-in-the-middle task for project-level code completion.

How to Use

This model is for completion purposes only. Here are some examples of how to use the model:

Running the model on a GPU

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript", trust_remote_code=True).cuda()

input_text = """<|fim begin|>function quickSort(arr: number[]): number[] {
  if (arr.length <= 1) {
    return arr;
  }
  const pivot = arr[0];
  const left = [];
  const right = [];
<|fim hole|>
  return [...quickSort(left), pivot, ...quickSort(right)];
}<|fim end|>"""

inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Running with Ollama

Model: https://ollama.ai/codegpt/deepseek-coder-1.3b-typescript
Command: ollama run codegpt/deepseek-coder-1.3b-typescript

Running with Ollama and CodeGPT Autocomplete in VSCode

Documentation: https://docs.codegpt.co/docs/tutorial-features/code_autocompletion
Select "Ollama - codegpt/deepseek-coder-1.3b-typescript" in the autocomplete model selector.

Fill In the Middle (FIM)

<|fim begin|>function quickSort(arr: number[]): number[] {
  if (arr.length <= 1) {
    return arr;
  }
  const pivot = arr[0];
  const left = [];
  const right = [];
<|fim hole|>
  return [...quickSort(left), pivot, ...quickSort(right)];
}<|fim end|>

Training Procedure

The model was trained using the following hyperparameters:

learning_rate: 2e-05
train_batch_size: 20
eval_batch_size: 20
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 40
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 261
num_epochs: 1

For more information, visit the model page.

Suggested labels

{ "label-name": "TypeScript-Code-Generation", "description": "Model for generating TypeScript code", "repo": "CodeGPTPlus/deepseek-coder-1.3b-typescript", "confidence": 70.59 }

#383: deepseek-ai/deepseek-coder-5.7bmqa-base · Hugging Face

### Details

Similarity score: 0.89 - [ ] [deepseek-ai/deepseek-coder-5.7bmqa-base · Hugging Face](https://huggingface.co/deepseek-ai/deepseek-coder-5.7bmqa-base)

Deepseek Coder Introduction

Deepseek Coder is a series of code language models, each trained from scratch on 2T tokens with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on a project-level code corpus with a window size of 16K and an extra fill-in-the-blank task, supporting project-level code completion and infilling. Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.

Key Features

Massive Training Data: Trained from scratch on 2T tokens, including 87% code and 13% linguistic data in both English and Chinese languages.
Highly Flexible & Scalable: Offered in model sizes of 1.3B, 5.7B, 6.7B, and 33B, enabling users to choose the setup most suitable for their requirements.
Superior Model Performance: State-of-the-art performance among publicly available code models on HumanEval, MultiPL-E, MBPP, DS-1000, and APPS benchmarks.
Advanced Code Completion Capabilities: A window size of 16K and a fill-in-the-blank task, supporting project-level code completion and infilling tasks.

Model Summary

deepseek-coder-5.7bmqa-base: A 5.7B parameter model with Multi Query Attention, trained on 2 trillion tokens.
Home Page: DeepSeek
Repository: deepseek-ai/deepseek-coder
Chat With DeepSeek Coder: DeepSeek-Coder

How to Use

This section provides examples of how to use the Deepseek Coder model for code completion, code insertion, and repository-level code completion tasks.

Code Completion

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True).cuda()

input_text = "#write a quick sort algorithm"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Code Insertion

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True).cuda()

input_text = """<|begin|>def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[0]
    left = []
    right = []
<|hole|>
    if arr[i] < pivot:
        left.append(arr[i])
    else:
        right.append(arr[i])
return quick_sort(left) + [pivot] + quick_sort(right)<|end|>"""

inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])

Repository Level Code Completion

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True).cuda()

input_text = """#utils.py
import torch
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

def load_data():
    iris = datasets.load_iris()
    X = iris.data
    y = iris.target

    # Standardize the data
    scaler = StandardScaler()
    X = scaler.fit_transform(X)

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

    # Convert numpy data to PyTorch tensors
    X_train = torch.tensor(X_train, dtype=torch.float32)
    X_test = torch.tensor(X_test, dtype=torch.float32)
    y_train = torch.tensor(y_train, dtype=torch.int64)
    y_test = torch.tensor(y_test, dtype=torch.int64)

     return X_train, X_test, y_train, y_test

def evaluate_predictions(y_test, y_pred):
    return accuracy_score(y_test, y_pred)
#model.py
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

class IrisClassifier(nn.Module):
    def __init__(self):
        super(IrisClassifier, self).__init__()
        self.fc = nn.Sequential(
            nn.Linear(4, 16),
            nn.ReLU(),
            nn.Linear(16, 3)
        )

    def forward(self, x):
        return self.fc(x)

    def train_model(self, X_train, y_train, epochs, lr, batch_size):
        criterion = nn.CrossEntropyLoss()
        optimizer = optim.Adam(self.parameters(), lr=lr)

        # Create DataLoader for batches
        dataset = TensorDataset(X_train, y_train)
        dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

        for epoch in range(epochs):
            for batch_X, batch_y in dataloader:
                optimizer.zero_grad()
                outputs = self(batch_X)
                loss = criterion(outputs, batch_y)
                loss.backward()
                optimizer.step()

    def predict(self, X_test):
        with torch.no_grad():
            outputs = self(X_test)
            _, predicted = outputs.max(1)
        return predicted.numpy()
#main.py
from utils import load_data, evaluate_predictions
from model import IrisClassifier as Classifier

def main():
    # Model training and evaluation
"""

inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=140)
print(tokenizer.decode(outputs[0]))

License

This code repository is licensed under the MIT License. The use of Deepseek Coder models is subject to the Model License. DeepSeek Coder supports commercial use.

See the LICENSE-MODEL for more details.

Contact

If you have any questions, please raise an issue or contact us at agi_code@deepseek.com.

Suggested labels

{ "key": "llm-experiments", "value": "Experiments and results related to Large Language Models" } { "key": "AI-Chatbots", "value": "Topics related to advanced chatbot platforms integrating multiple AI models" }

#324: bigcode/tiny_starcoder_py · Hugging Face

### Details

Similarity score: 0.88 > **Note:** > > [bigcode/tiny_starcoder_py · Hugging Face](https://huggingface.co/bigcode/tiny_starcoder_py) > > TinyStarCoderPy > > This is a 164M parameters model with the same architecture as StarCoder (8k context length, MQA & FIM). It was trained on the Python data from StarCoderData for ~6 epochs which amounts to 100B tokens. > > Use > > Intended use > > The model was trained on GitHub code, to assist with some tasks like Assisted Generation. For pure code completion, we advise using our 15B models StarCoder or StarCoderBase. > > Generation > > ```python > # pip install -q transformers > from transformers import AutoModelForCausalLM, AutoTokenizer > > checkpoint = "bigcode/tiny_starcoder_py" > device = "cuda" # for GPU usage or "cpu" for CPU usage > > tokenizer = AutoTokenizer.from_pretrained(checkpoint) > model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device) > > inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device) > outputs = model.generate(inputs) > print(tokenizer.decode(outputs[0])) > ``` > > Fill-in-the-middle > > Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output: > > ```python > input_text = "def print_one_two_three():\n print('one')\n \n print('three')" > inputs = tokenizer.encode(input_text, return_tensors="pt").to(device) > outputs = model.generate(inputs) > print(tokenizer.decode(outputs[0])) > ``` > > Training > > Model > > - Architecture: GPT-2 model with multi-query attention and Fill-in-the-Middle objective > - Pretraining steps: 50k > - Pretraining tokens: 100 billion > - Precision: bfloat16 > > Hardware > > - GPUs: 32 Tesla A100 > - Training time: 18 hours > > Software > > - Orchestration: Megatron-LM > - Neural networks: PyTorch > - BP16 if applicable: apex > > License > > The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement [here](https://huggingface.co/bigcode/tiny_starcoder_py/blob/main/LICENSE). > > #### Suggested labels > > - { "key": "llm-pretraining", "value": "Information related to the pretraining process of Large Language Models" }

#499: marella/ctransformers: Python bindings for the Transformer models implemented in C/C++ using GGML library.

### Details

Similarity score: 0.88 - [ ] [marella/ctransformers: Python bindings for the Transformer models implemented in C/C++ using GGML library.](https://github.com/marella/ctransformers?tab=readme-ov-file#gptq)

CTransformers

![Build and Test](https://github.com/ marella / ctransformers / actions / workflows / build.yml / badge.svg)

Python bindings for the Transformer models implemented in C/C++ using GGML library. Also see ChatDocs

Supported Models

Model	Model Type	CUDA	Metal
GPT-2	gpt2
GPT-J, GPT4All-J	gptj
GPT-NeoX, StableLM	gpt_neox
Falcon	falcon	✅
LLaMA, LLaMA 2	llamai	✅	✅
MPT	mpt	✅
StarCoder, StarChat	gpt_bigcode	✅
Dolly V2	dolly-v2
Replit	replit

Installation

To install via pip, simply run:

pip install ctransformers

Usage

It provides a unified interface for all models:

from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained("/path/to/ggml-model.bin", model_type="gpt2")

print(llm("AI is going to"))

Run in Google Colab

To stream the output:

for text in llm("AI is going to", stream=True):
    print(text, end="", flush=True)

You can load models from Hugging Face Hub directly:

llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml")

If a model repo has multiple model files (.bin or .gguf files), specify a model file using:

llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", model_file="ggml-model.bin")

🤗 Transformers

Note: This is an experimental feature and may change in the future.

To use with 🤗 Transformers, create the model and tokenizer using:

from ctransformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", hf=True)
tokenizer = AutoTokenizer.from_pretrained(model)

Run in Google Colab

You can use 🤗 Transformers text generation pipeline:

from transformers import pipeline

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(pipe("AI is going to", max_new_tokens=256))

You can use 🤗 Transformers generation parameters:

pipe("AI is going to", max_new_tokens=256, do_sample=True, temperature=0.8, repetition_penalty=1.1)

You can use 🤗 Transformers tokenizers:

from ctransformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", hf=True)  # Load model from GGML model repo.
tokenizer = AutoTokenizer.from_pretrained("gpt2")  # Load tokenizer from original model repo.

LangChain

It is integrated into LangChain. See LangChain docs.

GPU

To run some of the model layers on GPU, set the gpu_layers parameter:

llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-GGML", gpu_layers=50)

Run in Google Colab

CUDA

Install CUDA libraries using:

pip install ctransformers[cuda]

ROCm

To enable ROCm support, install the ctransformers package using:

CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers

Metal

To enable Metal support, install the ctransformers package using:

CT_METAL=1 pip install ctransformers --no-binary ctransformers

GPTQ

Note: This is an experimental feature and only LLaMA models are supported using [ExLlama](https
://github.com/TheLastBen/exllama).

Install additional dependencies using:

pip install ctransformers[gptq]

Load a GPTQ model using:

llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-GPTQ")

Run in Google Colab

If the model name or path doesn't contain the word gptq, specify model_type="gptq".

It can also be used with LangChain. Low-level APIs are not fully supported.

Documentation

Find the documentation on Read the Docs.

Config

Parameter	Type	Description	Default
`top_k`	`int`	The top-k value to use for sampling	`40`
`top_p`	`float`	The top-p value to use for sampling	`0.95`
`temperature`	`float`	The temperature to use for sampling	`0.8`
`repetition_penalty`	`float`	The repetition penalty to use for sampling	`1.1`
`last_n_tokens`	`int`	The number of last tokens to use for repetition penalty	`64`
`seed`	`int`	The seed value to use for sampling tokens	`-1`
`max_new_tokens`	`int`	The maximum number of new tokens to generate	`256`
`stop`	`List`	A list of sequences to stop generation when encountered	`None`
`stream`	`bool`	Whether to stream the generated text	`False`
`reset`	`bool`	Whether to reset the model state before generating text	`True`
`batch_size`	`int`	The batch size to use for evaluating tokens in a single prompt	`8`
`threads`	`int`	The number of threads to use for evaluating tokens	`-1`
`context_length`	`int`	The maximum context length to use	`-1`
`gpu_layers`	`int`	The number of layers to run on GPU	`0`

Find the URL for the model card for GPTQ here.

Made with ❤️ by marella

Suggested labels

null

#625: unsloth/README.md at main · unslothai/unsloth

### Details

Similarity score: 0.88 - [ ] [unsloth/README.md at main · unslothai/unsloth](https://github.com/unslothai/unsloth/blob/main/README.md?plain=1)

unsloth/README.md at main · unslothai/unsloth

Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory!

✨ Finetune for Free

All notebooks are beginner friendly! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.

Unsloth supports	Free Notebooks	Performance	Memory use
Gemma 7b	▶️ Start on Colab	2.4x faster	58% less
Mistral 7b	▶️ Start on Colab	2.2x faster	62% less
Llama-2 7b	▶️ Start on Colab	2.2x faster	43% less
TinyLlama	▶️ Start on Colab	3.9x faster	74% less
CodeLlama 34b A100	▶️ Start on Colab	1.9x faster	27% less
Mistral 7b 1xT4	▶️ Start on Kaggle	5x faster*	62% less
DPO - Zephyr	▶️ Start on Colab	1.9x faster	19% less

This conversational notebook is useful for ShareGPT ChatML / Vicuna templates.
This text completion notebook is for raw text. This DPO notebook replicates Zephyr.
* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.

🦥 Unsloth.ai News

📣 Gemma 7b on 6T tokens now works. And Gemma 2b notebook
📣 Added conversational notebooks and raw text notebooks
📣 2x faster inference added for all our models
📣 DPO support is now included. More info on DPO
📣 We did a blog with 🤗Hugging Face and are in their official docs! Check out the SFT docs and DPO docs
📣 Download models 4x faster from 🤗Hugging Face. Eg: unsloth/mistral-7b-bnb-4bit

🔗 Links and Resources

Type	Links
📚 Wiki & FAQ	Read Our Wiki
📜 Documentation	Read The Doc
💾 Installation	unsloth/README.md
Twitter (aka X)	Follow us on X
🥇 Benchmarking	Performance Tables
🌐 Released Models	Unsloth Releases
✍️ Blog	Read our Blogs

⭐ Key Features

All kernels written in OpenAI's Triton language. Manual backprop engine.
0% loss in accuracy - no approximation methods - all exact.
No change of hardware. Supports NVIDIA GPUs since 2018+. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc) Check your GPU! GTX 1070, 1080 works, but is slow.
Works on Linux and Windows via WSL.
Supports 4bit and 16bit QLoRA / LoRA finetuning via bitsandbytes.
Open source trains 5x faster - see Unsloth Pro for 30x faster training!
If you trained a model with 🦥Unsloth, you can use this cool sticker!

🥇 Performance Benchmarking

For the full list of reproducable benchmarking tables, go to our website

1 A100 40GB	🤗Hugging Face	Flash Attention	🦥Unsloth Open Source	🦥Unsloth Pro
Alpaca	1x	1.04x	1.98x	15.64x
LAION Chip2	1x	0.92x	1.61x	20.73x
OASST	1x	1.19x	2.17x	14.83x
Slim Orca	1x	1.18x	2.22x	14.82x

Benchmarking table below was conducted by 🤗Hugging Face.

Free Colab T4	Dataset	🤗Hugging Face	Pytorch 2.1.1	🦥Unsloth	🦥 VRAM reduction
Llama-2 7b	OASST	1x	1.19x	1.95x	-43.3%
Mistral 7b	Alpaca	1x	1.07x	1.56x	-13.7%
Tiny Llama 1.1b	Alpaca	1x	2.06x	3.87x	-73.8%
DPO with Zephyr	Ultra Chat	1x	1.09x	1.55x	-18.6%

View on GitHub

Suggested labels

irthomasthomas mentioned this issue Mar 14, 2024

[Announcement] Generation: Get probabilities for generated output - 🤗Transformers - Hugging Face Forums #714

Open

1 task

This was referenced Aug 6, 2024

magpie-ultra - a synthetic dataset for supervised fine-tuning using Llama 3.1 #870

Open

Xgboost 2.0.0 · dmlc/xgboost #878

Open

ShellLM mentioned this issue Aug 16, 2024

vidore/colpali · Hugging Face #891

Open

1 task

README.md · defog/sqlcoder-7b-2 at main #640

README.md · defog/sqlcoder-7b-2 at main #640

Comments

irthomasthomas commented Feb 27, 2024

README.md · defog/sqlcoder-7b-2 at main

Update notice

Model Card for SQLCoder-7B-2

Model Details

Model Description

Model Sources [optional]

Uses

How to Get Started with the Model

Prompt

Evaluation

Results

Model Card Contact

Suggested labels

irthomasthomas commented Feb 27, 2024

Related issues

#456: Baseline benchmark for 17 coding models : r/LocalLLaMA

Baseline Benchmark for 17 Coding Models

Discussion

Results

Suggested labels

{ "label-name": "coding-models", "description": "Discussion and benchmark of coding models implementation strategies.", "confidence": 96.82 }

#498: CodeGPTPlus/deepseek-coder-1.3b-typescript · Hugging Face

CodeGPTPlus/deepseek-coder-1.3b-typescript

How to Use

Running the model on a GPU

Running with Ollama

Running with Ollama and CodeGPT Autocomplete in VSCode

Fill In the Middle (FIM)

Training Procedure

Suggested labels

{ "label-name": "TypeScript-Code-Generation", "description": "Model for generating TypeScript code", "repo": "CodeGPTPlus/deepseek-coder-1.3b-typescript", "confidence": 70.59 }

#383: deepseek-ai/deepseek-coder-5.7bmqa-base · Hugging Face

Deepseek Coder Introduction

Key Features

Model Summary

How to Use

Code Completion

Code Insertion

Repository Level Code Completion

License

Contact

Suggested labels

{ "key": "llm-experiments", "value": "Experiments and results related to Large Language Models" } { "key": "AI-Chatbots", "value": "Topics related to advanced chatbot platforms integrating multiple AI models" }

#324: bigcode/tiny_starcoder_py · Hugging Face

#499: marella/ctransformers: Python bindings for the Transformer models implemented in C/C++ using GGML library.

CTransformers

Supported Models

Installation

Usage

🤗 Transformers

LangChain

GPU

CUDA

ROCm

Metal

GPTQ

Documentation

Config

Suggested labels

null

#625: unsloth/README.md at main · unslothai/unsloth

unsloth/README.md at main · unslothai/unsloth

Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory!

✨ Finetune for Free

🦥 Unsloth.ai News

🔗 Links and Resources

⭐ Key Features

🥇 Performance Benchmarking

Suggested labels