Agents should execute whenever possible.
Agent Skills turns repeatable agent reasoning into executable skills: reusable, testable, observable, and portable across tools and model providers.
Stop rebuilding agent logic in prompts. Define it once as a skill, bind it to any backend, and run it with full traceability.
Agent Skills Runtime is the reference implementation of ORCA (Open Cognitive Runtime Architecture):
- Skills package reusable cognitive workflows
- Capabilities define backend-agnostic contracts
- Bindings connect contracts to execution backends (PythonCall, OpenAPI, MCP, OpenRPC)
- Runtime executes DAGs with policy/safety, CognitiveState, and traceability
No API key required for local-first runs. Deterministic Python baselines are available for offline development and testing.
Most agent systems still encode critical logic inside prompts and framework glue.
That creates recurring engineering pain:
- Reasoning logic gets trapped in prompt text instead of executable workflows
- Workflows are hard to reuse and harder to test
- Contracts between steps are implicit and brittle
- Observability and auditability are often an afterthought
- Safety and governance controls are inconsistent
- Switching providers or frameworks usually means rewriting too much
ORCA introduces an execution layer for cognitive workflows:
- Skills are reusable cognitive workflows
- Capabilities are stable, contract-driven interfaces
- Bindings are interchangeable execution backends
- Runtime is a DAG scheduler + policy engine + cognitive state + trace
This keeps reasoning structure explicit and executable, while preserving portability across backends.
Before (logic trapped in prompt text):
prompt = """
Analyze this PR.
Find risks.
Estimate confidence.
Suggest fixes.
Return JSON.
"""After (logic as a reusable skill graph):
# Conceptual example (illustrative structure)
skill: code.pr.review
steps:
- parse_diff
- detect_risks
- score_confidence
- generate_review
- validate_outputSame reasoning pattern. Reusable. Testable. Observable. Bindable to Python, OpenAPI, MCP, or your own APIs.
git clone https://github.com/gfernandf/agent-skills.git
cd agent-skills
make bootstrap
python skills.py doctor
python skills.py run text.language-summary \
--input '{"text": "ORCA turns agent reasoning into reusable executable skills."}'What to expect:
- No API key required
- Runs offline with deterministic Python baselines
- First run may take 30-60 seconds
Windows PowerShell setup and run
git clone https://github.com/gfernandf/agent-skills.git
cd agent-skills
pip install -e ".[all,dev]"
git clone https://github.com/gfernandf/agent-skill-registry.git ../agent-skill-registry
python skills.py doctor
$env:OPENAI_API_KEY = ""
'{ "text": "ORCA turns agent reasoning into reusable executable skills." }' | Set-Content input_qs.json -Encoding ascii
python skills.py run text.language-summary --input-file input_qs.json
Remove-Item input_qs.jsonThe first command verifies install. The stronger demo is the official skill decision.make.
decision.make shows a full decision workflow under uncertainty with explicit stages and auditable outputs.
From the skill contract in the registry, it includes:
- Multi-step pipeline (option generation, analysis, scoring, justification, validation)
- Structured outputs such as recommendation, tradeoffs, confidence_score, confidence_level, uncertainties, and next_steps
- Risk-aware reasoning through explicit criteria and constraints
- Trace-friendly execution aligned with ORCA observability goals
Conceptual output shape for decision-style workflows:
{
"recommendation": "Proceed with a controlled pilot",
"confidence_score": 0.82,
"confidence_level": "medium",
"tradeoffs": [
"Faster learning, higher short-term operational overhead"
],
"uncertainties": [
"Regulatory timeline may change in Q3"
],
"next_steps": [
"Run a 6-week pilot in one segment"
],
"trace_id": "..."
}Note: the JSON above is illustrative. Exact outputs depend on input context, bindings, and policy settings.
- Start with local CLI: see Try it locally in 3 minutes
- Use deterministic baselines for offline reproducibility
- Explore first workflows in examples and docs
Choose one integration surface:
- Embedded SDK (lowest latency, in-process)
- HTTP API (service boundary, non-Python clients)
- MCP server (tooling ecosystems and MCP hosts)
- Framework adapters (LangChain, CrewAI, AutoGen, Semantic Kernel)
- Native tool definitions (Anthropic, OpenAI, Gemini)
- Author declarative skills as DAG workflows
- Reuse existing capability contracts
- Validate wiring and execution behavior
- Package and contribute reusable workflows
Think of Agent Skills as:
- Capabilities: what an operation means (contract)
- Bindings: how that operation is executed (backend)
- Skills: how operations compose into workflows (DAG)
- Runtime: how workflows execute safely and observably
The pure cognitive layer is intentionally narrower than the full runtime. The current taxonomy separates:
- Pure cognitive capabilities: decision, evaluation, evidence, memory, perception, and reasoning.
- Compatibility surfaces: legacy or transitional names such as
eval.*that remain in the live registry during migration. - Operational capabilities: routing, delegation, workflow control, and other runtime helpers that should not be counted as core cognition.
The registry-level reference for that taxonomy is:
Use that document as the source of truth when deciding whether a capability belongs to the cognitive core or to the operational layer.
Reusable cognitive workflows declared as DAGs.
Backend-agnostic contracts with typed inputs and outputs.
Execution adapters for PythonCall, OpenAPI, MCP, and OpenRPC.
Execution layer with DAG scheduling, policy gates, cognitive state, and trace.
| Mode | Best for | Requires server? |
|---|---|---|
| Embedded SDK | Python apps and notebooks | No |
| Native tool defs | Direct model SDK integration | No |
| Framework adapters | Existing agent frameworks | No |
| MCP server | MCP-compatible hosts | MCP host |
| HTTP API | Service-oriented architectures | Yes |
from sdk.embedded import as_langchain_tools
tools = as_langchain_tools(["text.content.summarize", "text.content.translate"])agent-skills serve
curl http://localhost:8080/v1/health
curl -X POST http://localhost:8080/v1/skills/text.language-summary/execute \
-H "Content-Type: application/json" \
-d '{"inputs": {"text": "Hello world from ORCA"}}'python -m official_mcp_servers
python -m official_mcp_servers --sse --host 0.0.0.0 --port 8765from sdk.embedded import as_openai_tools, execute_openai_tool_call
tools = as_openai_tools()
# pass tools to your OpenAI client, then dispatch tool calls via execute_openai_tool_callgraph TB
subgraph Interface
CLI[CLI]
HTTP[HTTP API]
SDK[Embedded SDK / Adapters]
MCP[MCP Server]
end
subgraph Runtime
GW[Gateway]
SCH[DAG Scheduler]
POL[Policy and Safety]
COG[CognitiveState]
TRC[Trace and Audit]
end
subgraph BindingLayer
RES[Binding Resolver]
PY[PythonCall]
OA[OpenAPI]
MP[MCP]
RP[OpenRPC]
end
subgraph Backends
BASE[Deterministic Python baselines]
EXT[External APIs and services]
MCPB[MCP backends]
end
CLI --> GW
HTTP --> GW
SDK --> GW
MCP --> GW
GW --> SCH
SCH --> POL
SCH --> COG
SCH --> TRC
POL --> RES
RES --> PY --> BASE
RES --> OA --> EXT
RES --> MP --> MCPB
RES --> RP --> EXT
Agent Skills is not a replacement for every agent framework.
It can run standalone, but its strongest use case is as a reusable execution layer underneath frameworks, tools, and model providers.
| Dimension | Agent Skills | Typical agent framework |
|---|---|---|
| Execution model | Declarative DAG skills | Often prompt/tool loop centered |
| Contracts | Capability-first, typed | Usually app-level conventions |
| Backend portability | Binding abstraction layer | Often provider/framework specific |
| Safety/governance | Policy gates and execution controls | Varies widely |
| Observability | Trace and audit oriented | Varies widely |
| Local deterministic mode | Yes, baseline-first workflow | Often key-dependent |
- Auth and RBAC controls
- Webhook eventing
- Plugin extension points
- Audit modes and runtime observability
- CognitiveState v1 and cognitive hints
- Runtime-managed output envelope (
status,rationale,trace_ref) - JSON Schema generation and validation
- Skill governance and conformance tooling
The runtime includes a quality gate bundle for pure cognitive capabilities.
Run the gate pack:
python tooling/run_cognitive_quality_gates.py \
--report-file artifacts/cognitive_quality_gates_local_report.jsonGenerate scorecard only:
python tooling/generate_cognitive_quality_scorecard.py \
--fail-on-threshold \
--min-axis 9.0 \
--min-overall 9.0Primary artifacts:
artifacts/cognitive_e2e_contract_report.jsonartifacts/cognitive_semantic_all_report.jsonartifacts/cognitive_quality_scorecard.jsonartifacts/cognitive_quality_gates_local_report.json
See docs index below for details.
| Topic | Link |
|---|---|
| 10-minute onboarding | docs/ONBOARDING_10_MIN.md |
| Target architecture (canonical) | docs/TARGET_ARCHITECTURE.md |
| Installation | docs/INSTALLATION.md |
| Environment variables | docs/ENVIRONMENT_VARIABLES.md |
| Error taxonomy | docs/ERROR_TAXONOMY.md |
| Runner architecture | docs/RUNNER_GUIDE.md |
| Binding selection policy | docs/BINDING_SELECTION.md |
| Binding authoring guide | docs/BINDING_GUIDE.md |
| DAG scheduler | docs/SCHEDULER.md |
| Step control flow | docs/STEP_CONTROL_FLOW.md |
| Streaming | docs/STREAMING.md |
| Async execution | docs/ASYNC_EXECUTION.md |
| Deployment | docs/DEPLOYMENT.md |
| Observability | docs/OBSERVABILITY.md |
| Auth and RBAC | docs/AUTH.md |
| Webhooks | docs/WEBHOOKS.md |
| Plugins | docs/PLUGINS.md |
| JSON schemas | docs/JSON_SCHEMAS.md |
| Skill authoring | docs/SKILL_AUTHORING.md |
| Troubleshooting | docs/TROUBLESHOOTING.md |
| Public release use cases | docs/PUBLIC_RELEASE_USE_CASES.md |
| Project status | docs/PROJECT_STATUS.md |
| ORCA specification | ORCA.md |
Serve docs locally:
make serveBeyond Prompting: Decoupling Cognition from Execution in LLM-based Agents through the ORCA Framework
Fernandez Alvarez, G. E. (2026)
- DOI: https://doi.org/10.5281/zenodo.19438943
- SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6600840
- Paper page: docs/PAPER.md
Contributions are welcome. See CONTRIBUTING.md.
make checkApache 2.0. See LICENSE.
If you use Agent Skills or ORCA in research, please cite:
@article{fernandez_orca_2026,
author = {Fernandez Alvarez, Guillermo E.},
title = {Beyond Prompting: Decoupling Cognition from Execution in
LLM-based Agents through the ORCA Framework},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.19438943},
url = {https://doi.org/10.5281/zenodo.19438943}
}Software citation:
@software{fernandez_agent_skills_2026,
author = {Fernandez Alvarez, Guillermo},
title = {Agent Skills Runtime},
year = {2026},
url = {https://github.com/gfernandf/agent-skills},
version = {1.0.2},
license = {Apache-2.0}
}See also CITATION.cff.
| Problem | Solution |
|---|---|
| Registry not found | Run doctor and ensure agent-skill-registry is cloned next to this repo |
| Command not found on Windows | Use python skills.py ... from repo root |
| Unexpected runtime error | Check docs/ERROR_TAXONOMY.md |
| Environment mismatch | Review docs/ENVIRONMENT_VARIABLES.md |
