Skip to content

Latest commit

 

History

History
168 lines (131 loc) · 6.25 KB

File metadata and controls

168 lines (131 loc) · 6.25 KB
title Developer Guide
description Get started with OpenMind OS (OM1)

Welcome to the OpenMind OS (OM1) Developer Guide. This guide provides an overview of the OM1 agent runtime system, including CLI commands, project structure, adding new actions, configuration, development tips, and optional environment variables.

CLI Commands

The main entry point is src/run.py which provides the following commands:

  • start: Start an agent with a specified config
    python src/run.py start [config_name] [--debug]
    • config_name: Name of the config file (without .json extension) in the config directory
    • --debug: Optional flag to enable debug logging

Project Structure

.
├── config/               # Agent configuration files
├── src/
│   ├── actions/          # Agent outputs/actions/capabilities
│   ├── fuser/            # Input fusion logic
│   ├── inputs/           # Input plugins (e.g. VLM, audio)
│   ├── llm/              # LLM integration
│   ├── providers/        # Providers
│   ├── runtime/          # Core runtime system
│   ├── simulators/       # Virtual endpoints such as `WebSim`
│   └── run.py            # CLI entry point

Lint and Unittesting

  • To unittest the system, run uv run pytest --log-cli-level=DEBUG -s
  • To lint the code, run uv run ruff check . --fix && uv run black . && uv run isort .

Adding New Actions

Actions are the core capabilities of an agent. For example, for a robot, these capabilities are actions such as movement and speech. Each action consists of:

  1. Interface (interface.py): Defines input/output types.
  2. Implementation (implementation/): Business logic, if any. Otherwise, use passthrough.
  3. Connector (connector/): Code that connects OM1 to specific virtual or physical environments, typically through middleware (e.g. custom APIs, ROS2, Zenoh, or CycloneDDS)

Example action structure:

actions/
└── move_{unique_hardware_id}/
    ├── interface.py      # Defines MoveInput/Output
    ├── implementation/
    │   └── passthrough.py
    └── connector/
        ├── ros2.py       # Maps OM1 data/commands to hardware layers and robot middleware
        ├── zenoh.py
        └── unitree.py

In general, each robot will have specific capabilities, and therefore, each action will be hardware specific.

Example: if you are adding support for the Unitree G1 Humanoid version 13.2b, which supports a new movement subtype such as dance_2, you could name the updated action move_unitree_g1_13_2b and select that action in your unitree_g1.json configuration file.

Configuration

Agents are configured via JSON files in the config/ directory. Key configuration elements:

{
  "hertz": 0.5,
  "name": "agent_name",
  "system_prompt": "...",
  "agent_inputs": [
    {
      "type": "VlmInput"
    }
  ],
  "cortex_llm": {
    "type": "OpenAILLM",
    "config": {
      "base_url": "",
      "api_key": "your_key_here"
    }
  },
  "simulators": [
    {
      "type": "WebSim"
    }
  ],
  "agent_actions": [
    {
      "name": "move",
      "implementation": "passthrough",
      "connector": "ros2"
    }
  ]
}
  • Hertz Defines the base tick rate of the agent. This rate can be overridden to allow the agent to respond quickly to changing environments using event-triggered callbacks through real-time middleware.

  • Name A unique identifier for the agent.

  • System Prompt Defines the agent's personality and behavior. This acts as the system prompt for the agent's operations.

  • Cortex LLM Configuration for the language model (LLM) used by the agent.

    • Type: Specifies the LLM plugin.

    • Config: Configuration for the LLM, including the API endpoint and API key. If you do not change the file, and use the openmind_free api key, the LLM operates with a rate limiter with Openmind's public endpoint.

OpenMind OpenAI Proxy endpoint is https://api.openmind.org/api/core/openai OpenMind DeepSeek Proxy endpoint is https://api.openmind.org/api/core/deepseek OpenMind Gemini Proxy endpoint is https://api.openmind.org/api/core/gemini

"cortex_llm": {
  "type": "OpenAILLM",
  "config": {
    "base_url": "...", // Optional: URL of the LLM endpoint
    "api_key": "..."   // Required: API key from OpenMind or OpenAI
  }
}

Simulators

Lists the simulation modules used by the agent. These define the simulated environment or entities the agent interacts with.

"simulators": [
  {
    "type": "WebSim"
  }
]

Agent Actions

Defines the agent's available capabilities, including action names, their implementation, and the connector used to execute them.

"agent_actions": [
  {
    "name": "move", // Action name
    "implementation": "passthrough", // Implementation to use
    "connector": "ros2" // Connector handler
  }
]

Development Tips

  1. Use --debug flag for detailed logging
  2. Add new input plugins in src/input/plugins/
  3. Add new LLM integrations in src/llm/plugins/
  4. Test actions with the passthrough implementation first
  5. Use type hints and docstrings for better code maintainability
  6. Run uv run ruff check . --fix && uv run black . && uv run isort . to check/format your code.

To automatically run the lint check before committing, install pre-commit and execute pre-commit install. This ensures that pre-commit checks run before each commit. Additionally, you can manually trigger all checks by running pre-commit run --all-files.

Optional Environment Variables

  • ETH_ADDRESS: The Ethereum address of agent, prefixed with Ox. Example: 0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045. Only relevant if your agent has a wallet.
  • UNITREE_WIRED_ETHERNET: Your network adapter that is connected to a Unitree robot. Example: eno0. Only relevant if your agent has a physical (robot) embodiment. You can set this to "SIM" to debug some limited functionality.