Skip to content

Commit

Permalink
update: README.md and fix tests
Browse files Browse the repository at this point in the history
  • Loading branch information
rinov committed Jul 30, 2023
1 parent 038fd78 commit fa252e5
Show file tree
Hide file tree
Showing 6 changed files with 149 additions and 37 deletions.
106 changes: 86 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
# Promptex

Promptex is a Python library specially crafted to simplify the handling of prompts used as inputs for Generative Pre-trained Transformer (GPT) models. It empowers researchers and developers to manage prompts effectively and efficiently.
Promptex is a Python library built for efficient and effective management of prompts utilized in Generative Pre-trained Transformer (GPT) models. Whether you're a researcher or a developer, this tool offers an easy way to manage, integrate, and analyze your prompts.

This framework plays a significant role in the fundamental research that aims to develop new architectures for foundation models and AGI. It concentrates on improving the generality and capability of these models and enhancing their training stability and efficiency.
## Key Features

## Core Focuses
- **Prompt Management**: Provides methods to set up and get prompts, allowing for flexibility to match your unique needs.
- **Data Integration**: Supports a wide array of file formats (JSON, JSON Schema, CSV, XML, YAML, HTML) for seamless integration into your data pipeline.
- **Analytics**: Equips you with the ability to gather detailed statistics about the elements in your prompts, such as text length and token count, valuable for fine-tuning your model.
- **Scalability**: Designed with large-scale projects in mind, ensuring efficiency regardless of the size of your GPT-based projects.

- **Flexibility**: Promptex provides methods to easily add, set, clear, and get prompts, giving you the ability to tailor the prompt setup to suit your unique requirements.
- **Data Integration**: It supports various file formats like JSON, JSON Schema, CSV, XML, YAML, and HTML for saving and loading prompts, ensuring seamless integration with your data pipeline.
- **Analytics**: The library provides a method to gather detailed statistics about the elements in your prompts, such as text length and token count. This feature can be very beneficial for fine-tuning your model.
- **Scalability**: Promptex is designed with efficiency and scalability in mind, making it an ideal tool for managing prompts in large-scale GPT-based projects.
By offering this wide range of functionalities, Promptex provides a flexible and efficient way to work with prompts in transformer models. Join us on this journey towards making GPT prompt handling simpler, more efficient, and effective!

## Key Features
## Why is Prompt Management and Analysis Necessary?

- **Prompt Management**: Easily add, set, clear, and get prompts.
- **File Handling**: Save and load prompts from files. It supports various formats such as JSON, CSV, XML, YAML, and HTML.
- **Stats Collection**: Obtain statistics about the elements in the prompt. This includes details such as the count, length, and token count of each element.
Generative Pre-trained Transformer (GPT) models and the like generate output based on input prompts. Prompts serve as essential cues that instruct the model what to generate, and the selection and management of these prompts significantly impact the quality and relevance of the results.

By offering this wide range of functionalities, Promptex provides a flexible and efficient way to work with prompts in transformer models. Join us on this journey towards making GPT prompt handling simpler, more efficient, and effective!
Moreover, efficiently managing prompts used in projects or research is crucial to ensure consistency and reproducibility. Promptex addresses these challenges, and by understanding not just the meaning of the sentences, but also the characteristics of the prompts, such as their composition and token count, it provides opportunities to maximize the performance of GPT models.

Therefore, effective prompt management and analysis not only aid in producing better and more consistent results but also pave the way to harness the full potential of GPT models.

## Installation

Expand All @@ -36,7 +36,20 @@ cd promptex
pip install -e .
```

## Prompt Components

In Promptex, Some of fundamental elements that make up a prompt for Generative Pre-trained Transformer (GPT) models. These elements, each with a specific role and priority, form the building blocks for creating effective prompts:

ROLE: This element indicates the role of the prompt.
INSTRUCTION: This element provides specific instructions to guide the GPT model's response.
CONSTRAINT: This element defines any constraints or limitations that the GPT model should adhere to while generating a response.
CONTEXT: This element gives the context or background information necessary for understanding the prompt.
INPUT_DATA: This element represents the specific input data that the GPT model needs to generate a response.
OUTPUT_FORMAT: This element specifies the desired format of the model's output.


## Getting Started

```python
from promptex.promptex import Promptex
from promptex.elements import *
Expand Down Expand Up @@ -100,16 +113,69 @@ promptex = promptex.load_prompt(path)

### Show stats of prompt
```python
path = "examples/test.json"
from promptex.promptex import Promptex
from promptex.elements import *
from promptex.encoding_strategy import *

# Save the prompt to a file
promptex.save_prompt(path)

# Load the prompt from a file
promptex = promptex.load_prompt(path)
promptex = Promptex()

promptex.set_elements(
[
Instruction("Create an innovative quest for a new fantasy RPG game."),
OutputFormat("Markdown text in English"),
]
)

text = SimpleTextEncodingStrategy().encode(promptex)
encoding_model_name = "gpt-4"
token_count = promptex.get_token_count(
text=text, encoding_model_name=encoding_model_name
)
stats = promptex.get_stats(text=text, encoding_model_name=encoding_model_name)

print(f"Token consumption: {token_count}")

print(f"Stats: {json.dumps(stats, indent=2, ensure_ascii=False)}")


Token consumption: 27
Stats: {
"element_count": {
"Instruction": 1,
"OutputFormat": 1,
"total": 2
},
"text_length": {
"Instruction": {
"min": 0,
"max": 54,
"avg": 54.0
},
"OutputFormat": {
"min": 0,
"max": 24,
"avg": 24.0
},
"total": 124
},
"token_count": {
"Instruction": {
"min": 0,
"max": 11,
"avg": 11.0
},
"OutputFormat": {
"min": 0,
"max": 4,
"avg": 4.0
},
"total": 27
}
}
```

### Encoding a promt as JSON
### Encoding a prompt as JSON
```python
strategy = SimpleJsonEncodingStrategy()
prompt = strategy.encode(promptex)
Expand Down Expand Up @@ -139,7 +205,7 @@ print(prompt)
"""
```

### Encoding a promt as XML
### Encoding a prompt as XML
```python
strategy = SimpleXmlEncodingStrategy()
prompt = strategy.encode(promptex)
Expand All @@ -158,7 +224,7 @@ print(prompt)
"""
```

### Encoding a promt as JSON Schema
### Encoding a prompt as JSON Schema
```python
strategy = SimpleXmlEncodingStrategy()
prompt = strategy.encode(promptex)
Expand Down
31 changes: 31 additions & 0 deletions examples/calc_token_consumption.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import sys
import os
import json

sys.path.insert(
0, os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
)
from promptex.promptex import Promptex
from promptex.elements import *
from promptex.encoding_strategy import *


promptex = Promptex()

promptex.set_elements(
[
Instruction("Create an innovative quest for a new fantasy RPG game."),
OutputFormat("Markdown text in English"),
]
)

text = SimpleTextEncodingStrategy().encode(promptex)
encoding_model_name = "gpt-4"
token_count = promptex.get_token_count(
text=text, encoding_model_name=encoding_model_name
)
stats = promptex.get_stats(text=text, encoding_model_name=encoding_model_name)

print(f"Token consumption: {token_count}")

print(f"Stats: {json.dumps(stats, indent=2, ensure_ascii=False)}")
8 changes: 4 additions & 4 deletions promptex/elements/element.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,10 @@ def to_dict(self) -> Dict[str, str]:
elements_of_type = [
element.text
for element in self.elements
if element.type == element_type.value
if element.type == element_type
]
if elements_of_type:
data[element_type.value] = elements_of_type
return {self.type.value: data}
data[element_type] = elements_of_type
return {self.type: data}
else:
return {self.type.value: self.text}
return {self.type: self.text}
17 changes: 17 additions & 0 deletions promptex/elements/element_type.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,19 @@


class ElementType(Enum):
"""
An Enum class that represents types of elements that make up the prompts in GPT models.
Each Enum value represents the role of the element within the prompt and its priority.
ROLE : An element that indicates the role of the prompt
INSTRUCTION : An element that represents the instructions of the prompt
CONSTRAINT : An element that represents the constraint conditions of the prompt
CONTEXT : An element that provides the context or background information of the prompt
INPUT_DATA : An element that represents the input data of the prompt
OUTPUT_FORMAT : An element that represents the output format of the prompt
"""

ROLE = "role", 1
INSTRUCTION = "instruction", 2
CONSTRAINT = "constraint", 3
Expand All @@ -10,5 +23,9 @@ class ElementType(Enum):
OUTPUT_FORMAT = "output_format", 6

def __init__(self, value, priority):
"""
:param value: The value of the Enum (a string representing the role of the prompt element)
:param priority: The priority of the Enum (a number)
"""
self._value_ = value
self.priority = priority
7 changes: 4 additions & 3 deletions promptex/promptex.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,10 +86,11 @@ def get_token_count(self, text, encoding_model_name="gpt-4") -> int:
token_count = len(encoded)
return token_count

def get_stats(self, encoding_model_name="gpt-4") -> Dict[str, Any]:
def get_stats(self, text, encoding_model_name="gpt-4") -> Dict[str, Any]:
"""
Get statistics about the elements in the prompt.
:seealso: https://github.com/openai/tiktoken/blob/main/tiktoken/model.py
:param text: The prompt text to get the statistics
:param encoding_model_name: gpt-4, gpt-3.5-turbo, text-davinci-001, etc..
:return: A dictionary with statistics
"""
Expand Down Expand Up @@ -143,9 +144,9 @@ def get_stats(self, encoding_model_name="gpt-4") -> Dict[str, Any]:
"element_count"
][element_type]

stats["text_length"]["total"] = len(self.get_prompt_text())
stats["text_length"]["total"] = len(text)
stats["token_count"]["total"] = self.get_token_count(
text=self.get_prompt_text(),
text=text,
encoding_model_name=encoding_model_name,
)

Expand Down
17 changes: 7 additions & 10 deletions tests/test_promptex.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,45 +36,42 @@ def output_format():

def test_role(role):
assert role.text == "Role"
assert role.type == ElementType.ROLE
assert role.type == "role"
assert role.to_dict() == {"role": "Role"}


def test_instruction(instruction):
assert instruction.text == "Instruction"
assert instruction.type == ElementType.INSTRUCTION
assert instruction.type == "instruction"
assert instruction.to_dict() == {"instruction": "Instruction"}


def test_constraint(constraint):
assert constraint.text == "Constraint"
assert constraint.type == ElementType.CONSTRAINT
assert constraint.type == "constraint"
assert constraint.to_dict() == {"constraint": "Constraint"}


def test_context(context):
assert context.text == "Context"
assert context.type == ElementType.CONTEXT
assert context.type == "context"
assert context.to_dict() == {"context": "Context"}


def test_input_data(input_data):
assert input_data.text == "InputData"
assert input_data.type == ElementType.INPUT_DATA
assert input_data.type == "input_data"
assert input_data.to_dict() == {"input_data": "InputData"}


def test_output_format(output_format):
assert output_format.text == "OutputFormat"
assert output_format.type == ElementType.OUTPUT_FORMAT
assert output_format.type == "output_format"
assert output_format.to_dict() == {"output_format": "OutputFormat"}


def test_promptex(role, instruction, output_format):
promptex = Promptex()
promptex.build_prompt([role, instruction, output_format])
promptex.add_elements([role, instruction, output_format])

assert len(promptex.elements) == 3
assert promptex.get_prompt_text() == "\n".join(
[role.text, instruction.text, output_format.text]
)

0 comments on commit fa252e5

Please sign in to comment.