-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding the kernel in Semantic Kernel | Microsoft Learn #679
Comments
Related issues#396: astra-assistants-api: A backend implementation of the OpenAI beta Assistants API### DetailsSimilarity score: 0.87 - [ ] [datastax/astra-assistants-api: A backend implementation of the OpenAI beta Assistants API](https://github.com/datastax/astra-assistants-api)Astra Assistant API ServiceA drop-in compatible service for the OpenAI beta Assistants API with support for persistent threads, files, assistants, messages, retrieval, function calling and more using AstraDB (DataStax's db as a service offering powered by Apache Cassandra and jvector). Compatible with existing OpenAI apps via the OpenAI SDKs by changing a single line of code. Getting Started
client = OpenAI(
api_key=OPENAI_API_KEY,
) with: client = OpenAI(
base_url="https://open-assistant-ai.astra.datastax.com/v1",
api_key=OPENAI_API_KEY,
default_headers={
"astra-api-token": ASTRA_DB_APPLICATION_TOKEN,
}
) Or, if you have an existing astra db, you can pass your db_id in a second header: client = OpenAI(
base_url="https://open-assistant-ai.astra.datastax.com/v1",
api_key=OPENAI_API_KEY,
default_headers={
"astra-api-token": ASTRA_DB_APPLICATION_TOKEN,
"astra-db-id": ASTRA_DB_ID
}
)
assistant = client.beta.assistants.create(
instructions="You are a personal math tutor. When asked a math question, write and run code to answer the question.",
model="gpt-4-1106-preview",
tools=[{"type": "retrieval"}]
) By default, the service uses AstraDB as the database/vector store and OpenAI for embeddings and chat completion. Third party LLM SupportWe now support many third party models for both embeddings and completion thanks to litellm. Pass the api key of your service using For AWS Bedrock, you can pass additional custom headers: client = OpenAI(
base_url="https://open-assistant-ai.astra.datastax.com/v1",
api_key="NONE",
default_headers={
"astra-api-token": ASTRA_DB_APPLICATION_TOKEN,
"embedding-model": "amazon.titan-embed-text-v1",
"LLM-PARAM-aws-access-key-id": BEDROCK_AWS_ACCESS_KEY_ID,
"LLM-PARAM-aws-secret-access-key": BEDROCK_AWS_SECRET_ACCESS_KEY,
"LLM-PARAM-aws-region-name": BEDROCK_AWS_REGION,
}
) and again, specify the custom model for the assistant. assistant = client.beta.assistants.create(
name="Math Tutor",
instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
model="meta.llama2-13b-chat-v1",
) Additional examples including third party LLMs (bedrock, cohere, perplexity, etc.) can be found under To run the examples using poetry:
poetry install
poetry run python examples/completion/basic.py
poetry run python examples/retreival/basic.py
poetry run python examples/function-calling/basic.py CoverageSee our coverage report here. Roadmap
Suggested labels{ "key": "llm-function-calling", "value": "Integration of function calling with Large Language Models (LLMs)" }#499: marella/ctransformers: Python bindings for the Transformer models implemented in C/C++ using GGML library.### DetailsSimilarity score: 0.86 - [ ] [marella/ctransformers: Python bindings for the Transformer models implemented in C/C++ using GGML library.](https://github.com/marella/ctransformers?tab=readme-ov-file#gptq)CTransformers
Python bindings for the Transformer models implemented in C/C++ using GGML library. Also see ChatDocs Supported Models
InstallationTo install via
UsageIt provides a unified interface for all models: from ctransformers import AutoModelForCausalLM
llm = AutoModelForCausalLM.from_pretrained("/path/to/ggml-model.bin", model_type="gpt2")
print(llm("AI is going to")) Run in Google Colab To stream the output: for text in llm("AI is going to", stream=True):
print(text, end="", flush=True) You can load models from Hugging Face Hub directly: llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml") If a model repo has multiple model files ( llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", model_file="ggml-model.bin") 🤗 TransformersNote: This is an experimental feature and may change in the future. To use with 🤗 Transformers, create the model and tokenizer using: from ctransformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", hf=True)
tokenizer = AutoTokenizer.from_pretrained(model) Run in Google Colab You can use 🤗 Transformers text generation pipeline: from transformers import pipeline
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(pipe("AI is going to", max_new_tokens=256)) You can use 🤗 Transformers generation parameters: pipe("AI is going to", max_new_tokens=256, do_sample=True, temperature=0.8, repetition_penalty=1.1) You can use 🤗 Transformers tokenizers: from ctransformers import AutoModelForCausalLM
from transformers import AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", hf=True) # Load model from GGML model repo.
tokenizer = AutoTokenizer.from_pretrained("gpt2") # Load tokenizer from original model repo. LangChainIt is integrated into LangChain. See LangChain docs. GPUTo run some of the model layers on GPU, set the llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-GGML", gpu_layers=50) Run in Google Colab CUDAInstall CUDA libraries using: pip install ctransformers[cuda] ROCmTo enable ROCm support, install the CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers MetalTo enable Metal support, install the CT_METAL=1 pip install ctransformers --no-binary ctransformers GPTQNote: This is an experimental feature and only LLaMA models are supported using [ExLlama](https Install additional dependencies using: pip install ctransformers[gptq] Load a GPTQ model using: llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-GPTQ") Run in Google Colab If the model name or path doesn't contain the word It can also be used with LangChain. Low-level APIs are not fully supported. DocumentationFind the documentation on Read the Docs. Config
Find the URL for the model card for GPTQ here. Made with ❤️ by marella Suggested labelsnull#367: Working Pen.el LSP server. An early demo - An AI overlay for everything -- async and parallelised : r/emacs### DetailsSimilarity score: 0.85 - [ ] [Working Pen.el LSP server. An early demo - An AI overlay for everything -- async and parallelised : r/emacs](https://www.reddit.com/r/emacs/comments/rr7u8o/working_penel_lsp_server_an_early_demo_an_ai/)Working Pen.el LSP server. An early demo - An AI overlay for everything -- async and parallelised So the idea is that no matter what you're doing the LSP server generates documentation and refactoring tools for whatever you're looking at -- whether you're surfing the web through emacs, looking at photos, talking to people, or writing code. And should also be able to plug this into other editors. One LSP server for everything. So far you can only try it out from within the docker container. Start pen, open a text file, python, elisp, etc. and run lsp. Anyway, I'm going to continue working on this into next year. Add a Comment What data is sent to those services, and why are they needed? echo "sk-" > ~/.pen/openai_api_key # https://openai.com/ Reply Reply Reply Suggested labels{ "key": "llm-overlay", "value": "Using Large Language Models to generate documentation and refactoring tools for various activities within Emacs and other editors" }#369: "You are a helpful AI assistant" : r/LocalLLaMA### DetailsSimilarity score: 0.85 - [ ] ["You are a helpful AI assistant" : r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/comments/18j59g1/you_are_a_helpful_ai_assistant/?share_id=g_M0-7C_zvS88BCd6M_sI&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=1)"You are a helpful AI assistant" Discussion Don't say "don't": this confuses them, which makes sense when you understand how they "think". They do their best to string concepts together, but they simply generate the next word in the sequence from the context available. Saying "don't" will put everything following that word into the equation for the following words. This can cause it to use the words and concepts you're telling it not to. (system prompts) Here is some context for the conversation: (Paste in relevant info such as web pages, documentation, etc, as well as bits of the convo you want to keep in context. When you hit the context limit, you can restart the chat and continue with the same context). "You are a helpful AI assistant" : this is the demo system prompt to just get agreeable answers from any model. The issue with this is, once again, how they "think". The models can't conceptualize what is helpful beyond agreeing with and encouraging you. This kind of statement can lead to them making up data and concepts in order to agree with you. This is extra fun because you may not realize the problem until you discover for yourself the falacy of your own logic. Then pass the list to your assistant you intend to chat with with something like "you can confidently answer in these subjects that you are an expert in: (the list). The point of this ^ is to limit its responses to what it actually knows, but make it confidentially answer with the information it's sure about. This has been incredibly useful in my cases, but absolutely check their work. Suggested labels{ "key": "sparse-computation", "value": "Optimizing large language models using sparse computation techniques" }#418: openchat/openchat-3.5-1210 · Hugging Face### DetailsSimilarity score: 0.85 - [ ] [openchat/openchat-3.5-1210 · Hugging Face](https://huggingface.co/openchat/openchat-3.5-1210#conversation-templates)Using the OpenChat ModelWe highly recommend installing the OpenChat package and using the OpenChat OpenAI-compatible API server for an optimal experience. The server is optimized for high-throughput deployment using vLLM and can run on a consumer GPU with 24GB RAM.
Online DeploymentIf you want to deploy the server as an online service, use the following options:
For security purposes, we recommend using an HTTPS gateway in front of the server. Mathematical Reasoning ModeThe OpenChat model also supports mathematical reasoning mode. To use this mode, include
Conversation TemplatesWe provide several pre-built conversation templates to help you get started.
Suggested labels{ "label": "chat-templates", "description": "Pre-defined conversation structures for specific modes of interaction." } |
Understanding the kernel in Semantic Kernel | Microsoft Learn
DESCRIPTION:
"Understanding the kernel in Semantic Kernel
Article
02/22/2024
2 contributors
In this article
The kernel is at the center of everything
Building a kernel
Invoking plugins from the kernel
Going further with the kernel
Next steps
Similar to operating system, the kernel is responsible for managing resources that are necessary to run "code" in an AI application. This includes managing the AI models, services, and plugins that are necessary for both native code and AI services to run together.
If you want to see the code demonstrated in this article in a complete solution, check out the following samples in the public documentation repository.
Language | Link to final solution
C# | Open example in GitHub
Python | Open solution in GitHub
The kernel is at the center of everything
Because the kernel has all of the services and plugins necessary to run both native code and AI services, it is used by nearly every component within the Semantic Kernel SDK. This means that if you run any prompt or code in Semantic Kernel, it will always go through a kernel.
This is extremely powerful, because it means you as a developer have a single place where you can configure, and most importantly monitor, your AI application. Take for example, when you invoke a prompt from the kernel. When you do so, the kernel will...
Throughout this entire process, you can create events and middleware that are triggered at each of these steps. This means you can perform actions like logging, provide status updates to users, and most importantly responsible AI. All from a single place.
Building a kernel
Before building a kernel, you should first understand the two types of components that exist within a kernel: services and plugins. Services consist of both AI services and other services that are necessary to run your application (e.g., logging, telemetry, etc.). Plugins, meanwhile, are any code you want AI to call or leverage within a prompt.
In the following examples, you can see how to add a logger, chat completion service, and plugin to the kernel.
C#
Python
Import the necessary packages:
If you are using a Azure OpenAI, you can use the AzureChatCompletion class.
If you are using OpenAI, you can use the OpenAIChatCompletion class.
Run the today function from the time plugin:
Run the ShortPoem function from WriterPlugin while using the current time as an argument:
The text was updated successfully, but these errors were encountered: