A reverse-engineered Rust library implementing OpenAI's Harmony response format for structured conversational AI interactions.
This library does NOT include an AI model. It provides the conversation formatting and parsing layer that works with OpenAI's models that understand the Harmony format. You still need:
- OpenAI API access or compatible model
- A model that understands Harmony formatting (
<|start|>,<|message|>,<|end|>tokens) - Integration code to send formatted tokens to the model and receive responses
What this library does: Formats conversations → [Your OpenAI Model] → Parses responses
This library provides a complete implementation of the Harmony response format used by OpenAI's open-weight model series (gpt-oss). It enables parsing and rendering of structured conversations with support for:
- Multiple communication channels (analysis, commentary, final)
- Tool calling and function integration
- Reasoning effort control
- Streaming token parsing
- System and developer instructions
- Rust-based core with minimal overhead
- Thread-local regex optimization
- Efficient tokenization with BPE encoding
- Memory-efficient streaming parser
- Support for multiple encoding configurations
- Extensible tool system with namespaces
- Configurable channel routing
- Role-based message validation
- Native Rust library
- Python bindings (PyO3) with full API compatibility
- WebAssembly support with interactive demo
- Cross-platform vocabulary download and caching
- Comprehensive test suite (13 tests passing)
- Performance benchmarks for all operations
- Graceful error handling and network failure recovery
- 4 detailed examples with documentation
- Thread-safe concurrent processing
Add to your Cargo.toml:
[dependencies]
harmony-protocol = { git = "https://github.com/yourusername/harmony-protocol" }use harmony_protocol::{
load_harmony_encoding, HarmonyEncodingName,
chat::{Role, Message, Conversation, SystemContent}
};
fn main() -> anyhow::Result<()> {
// Load the encoding
let enc = load_harmony_encoding(HarmonyEncodingName::HarmonyGptOss)?;
// Create a conversation
let convo = Conversation::from_messages([
Message::from_role_and_content(
Role::System,
SystemContent::new()
.with_required_channels(["analysis", "commentary", "final"])
),
Message::from_role_and_content(Role::User, "Hello, world!"),
]);
// Render for completion (ready to send to OpenAI model)
let input_tokens = enc.render_conversation_for_completion(&convo, Role::Assistant, None)?;
println!("Generated {} tokens ready for OpenAI model", input_tokens.len());
// TODO: Send input_tokens to your OpenAI model and get response_tokens
// let response_tokens = your_openai_client.complete(input_tokens).await?;
// Parse the model's response back to structured messages
// let messages = enc.parse_messages_from_completion_tokens(response_tokens, Some(Role::Assistant))?;
Ok(())
}use harmony_protocol::chat::{
SystemContent, ToolDescription, ToolNamespaceConfig, Message, Role
};
fn main() -> anyhow::Result<()> {
let tools = vec![
ToolDescription::new(
"calculate",
"Performs mathematical calculations",
Some(serde_json::json!({
"type": "object",
"properties": {
"expression": {"type": "string"}
},
"required": ["expression"]
}))
)
];
let function_namespace = ToolNamespaceConfig::new("functions", None, tools);
let system_content = SystemContent::new()
.with_browser_tool()
.with_tools(function_namespace);
let message = Message::from_role_and_content(Role::System, system_content);
Ok(())
}use harmony_protocol::{StreamableParser, load_harmony_encoding, HarmonyEncodingName};
use harmony_protocol::chat::Role;
fn main() -> anyhow::Result<()> {
let encoding = load_harmony_encoding(HarmonyEncodingName::HarmonyGptOss)?;
let mut parser = StreamableParser::new(encoding.clone(), Some(Role::Assistant))?;
// In practice, response_tokens would come from your OpenAI model's streaming API
let response_tokens = vec![200006, 1234, 5678]; // These would be from OpenAI
// Process tokens as they arrive from the model
for token in response_tokens {
parser.process(token)?;
// Get content delta for real-time streaming UI updates
if let Ok(Some(delta)) = parser.last_content_delta() {
print!("{}", delta); // Show new content to user immediately
}
}
// Get final structured messages after streaming is complete
let messages = parser.into_messages();
println!("\nParsed {} messages from model output", messages.len());
Ok(())
}The Harmony format structures conversations using special tokens:
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Reasoning: medium
# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|>
<|start|>user<|message|>What is 2 + 2?<|end|>
<|start|>assistant<|channel|>analysis<|message|>I need to perform a simple arithmetic calculation.<|end|>
<|start|>assistant<|channel|>final<|message|>2 + 2 equals 4.<|end|>
The library supports multiple communication channels for organized model outputs:
- analysis: Internal reasoning and analysis
- commentary: Model explanations and meta-commentary
- final: User-facing final responses
Channels can be configured as required, and the system automatically handles analysis dropping when final responses are complete.
- Browser Tools: Web browsing, search, and content extraction
- Python Tools: Code execution environment
- Function Tools: Custom function definitions
use harmony_protocol::chat::ToolDescription;
fn main() {
let custom_tool = ToolDescription::new(
"weather",
"Gets current weather for a location",
Some(serde_json::json!({
"type": "object",
"properties": {
"location": {"type": "string"},
"units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}))
);
println!("Created custom tool: {}", custom_tool.name);
}┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Chat Module │ │ Encoding Module │ │ Registry Module │
│ │ │ │ │ │
│ • Message │◄──►│ • Rendering │◄──►│ • Configurations│
│ • Conversation │ │ • Parsing │ │ • Token Mappings│
│ • Content Types │ │ • Streaming │ │ • Vocab Loading │
└─────────────────┘ └─────────────────┘ └─────────────────┘
▲ ▲
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Tiktoken Module │ │ Extensions │
│ │ │ │
│ • BPE Encoding │ │ • Public Vocabs │
│ • Tokenization │ │ • Hash Verify │
│ • Thread Safety │ │ • Remote Loading│
└─────────────────┘ └─────────────────┘
| Token | ID | Purpose |
|---|---|---|
| `< | start | >` |
| `< | message | >` |
| `< | end | >` |
| `< | channel | >` |
| `< | call | >` |
| `< | return | >` |
| `< | constrain | >` |
TIKTOKEN_ENCODINGS_BASE: Custom vocabulary file directoryTIKTOKEN_RS_CACHE_DIR: Custom cache directory
python-binding: Enable PyO3 Python bindingswasm-binding: Enable WebAssembly support
- Context Window: 1,048,576 tokens (1M)
- Max Action Length: 524,288 tokens (512K)
- Thread-Safe: Optimized for concurrent access
- Memory Efficient: Token reuse and streaming parsing
# Run all tests (13 tests covering unit + integration)
cargo test
# Run performance benchmarks
cargo bench
# Run specific examples
cargo run --example basic_usage
cargo run --example tool_integration
cargo run --example streaming_parser
cargo run --example channel_managementThe test suite includes comprehensive validation against canonical examples and edge cases.
The library includes 4 comprehensive examples:
basic_usage.rs- Message creation and conversation renderingtool_integration.rs- Custom tools and function callingstreaming_parser.rs- Real-time token processingchannel_management.rs- Multi-channel workflows
See examples/README.md for detailed usage instructions.
cd python
python setup.py build_rust
pip install -e .import harmony_protocol as hr
# Same API as Rust, but in Python
encoding = hr.load_harmony_encoding(hr.HarmonyEncodingName.harmony_gpt_oss())
conversation = hr.Conversation.from_messages([
hr.Message.from_role_and_content(hr.Role.user(), "Hello!")
])
tokens = encoding.render_conversation(conversation)cd www
npm run build # Requires wasm-pack
npm run serve # Open http://localhost:8000Run benchmarks to see performance characteristics:
cargo benchResults show the library can handle:
- Large conversations: 1000+ messages efficiently
- Real-time streaming: Process tokens as they arrive from model
- Concurrent access: Thread-safe for multiple conversations
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
This project is licensed under the Apache License 2.0.
This is a reverse-engineered implementation for educational and research purposes. It is not affiliated with or endorsed by OpenAI.