Skip to content

Autocrat2005/Autocrat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

⚑ Autocrat

Open-source JARVIS for your Windows PC.

One prompt. Full system control. It builds the tools it doesn't have.

License Python Ollama Windows Plugins Commands


Autocrat β€” JARVIS for your PC

See It In Action Β· Quick Start Β· How It Thinks Β· What It Can Do Β· Architecture


πŸ’¬ See It In Action

Autocrat understands you like an assistant, not a terminal.

Just talk to it

You:       "close spotify, mute the volume, and take a screenshot"
Autocrat:   βœ“ Killed process: Spotify.exe
            βœ“ Volume muted
            βœ“ Screenshot saved β†’ screenshots/capture_20260301_143022.png
            ⏱ 340ms (3 actions, parallel)

Ask it to build tools that don't exist yet

You:       "build a plugin that monitors my CPU temperature and alerts me when it crosses 80Β°C"
Autocrat:   πŸ”¨ Generating plugin: cpu_temp_monitor
            πŸ” AST validation passed (no unsafe patterns)
            βœ… Plugin loaded β€” 2 new commands registered
            β†’ Try: "check cpu temp" or "set temp alert 80"

Let it browse the web for you

You:       "go to github.com/trending and tell me the top 3 repos today"
Autocrat:   🌐 Launching headless browser...
            πŸ“„ Navigating to github.com/trending
            πŸ” Extracting trending repositories...

            1. google/gemma-3 β€” Lightweight open model by Google DeepMind
            2. ollama/ollama β€” Get up and running with large language models
            3. microsoft/TypeScript β€” TypeScript language and compiler
            ⏱ 2.4s

Control your PC from your phone

You (from phone, via ngrok):   "lock my pc"
Autocrat:                       ⚠️ Destructive action: lock workstation
                                Approve? [yes/no]
You:                            "yes"
Autocrat:                       βœ“ Workstation locked.

Chain complex automations in plain English

You:       "create a folder called 'DailyReport', take a screenshot, save it there,
            then write a text file with today's CPU and RAM usage"
Autocrat:   βœ“ Created folder: DailyReport
            βœ“ Screenshot β†’ DailyReport/screen_20260301.png
            βœ“ System stats written β†’ DailyReport/stats_20260301.txt
            ⏱ 580ms (workflow: 4 steps)

Ask it anything β€” it's also a chatbot

You:       "explain the difference between threads and processes in python"
Autocrat:   A process is an independent program with its own memory space...
            [streams token-by-token with a blinking cursor]

πŸš€ Quick Start

# 1. Clone & install
git clone https://github.com/Autocrat2005/Autocrat.git
cd Autocrat
python -m venv .venv && .venv\Scripts\activate
pip install -r requirements.txt

# 2. Pull the brain
ollama pull qwen2.5-coder:3b
ollama serve                     # keep running in background

# 3. Launch your assistant
python main.py                   # CLI β€” talk in the terminal
python main.py --web             # Web dashboard β€” http://127.0.0.1:9000

Access from anywhere

ngrok http 9000                  # tunnel it
# β†’ Open the ngrok URL on your phone, tablet, or another PC

You now have a JARVIS-style AI controlling your desktop from any device on earth.


🧠 How It Thinks

The 4-Stage Brain

Every input goes through four layers. The fastest one that understands you wins β€” the rest don't even fire.

 "open chrome"
      β”‚
      β–Ό
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚  Stage 1: Regex Parser                          < 1ms    β”‚
 β”‚  200+ hand-tuned patterns. Instant recognition.          β”‚
 β”‚  "open chrome" β†’ appLauncher.launch(name="chrome") βœ…    β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      ↓ (only if Stage 1 didn't match)

 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚  Stage 2: ML Brain                              ~ 5ms    β”‚
 β”‚  Sentence-transformer (all-MiniLM-L6-v2).                β”‚
 β”‚  Semantic similarity against learned intents.             β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      ↓ (only if Stage 2 confidence < threshold)

 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚  Stage 3: LLM (Ollama)                          ~ 1-3s   β”‚
 β”‚  Smart context filter β†’ 30 most relevant commands.       β”‚
 β”‚  Native tool calling β†’ model picks function + params.    β”‚
 β”‚  "play music in my downloads folder" β†’ complex mapping.  β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      ↓ (only if no action matched)

 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚  Stage 4: Conversational                        ~ 1-3s   β”‚
 β”‚  General knowledge. "What's a mutex?" β†’ answer streams   β”‚
 β”‚  token-by-token to the web UI via SSE.                   β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

90% of commands resolve in Stage 1 or 2 β€” under 10ms. The LLM is a smart fallback, not the bottleneck.

The Agentic Engine (v2.0)

When the LLM does fire, it's not dumb prompt engineering. It's proper agent-style tool use:

Input: "close spotify and take a screenshot"
  β”‚
  β”œβ”€ Smart Context Window
  β”‚    160+ commands β†’ keyword scoring + synonyms β†’ top 30 relevant sent
  β”‚
  β”œβ”€ Native Tool Calling (Ollama /api/chat)
  β”‚    Model receives structured function definitions
  β”‚    Returns: tool_call(processController.kill, {name: "spotify"})
  β”‚           + tool_call(screenIntel.screenshot, {})
  β”‚
  └─ Parallel Executor
       Different plugins β†’ fire concurrently (ThreadPoolExecutor)
       Both finish in ~200ms instead of ~400ms sequentially
v1.0 β†’ v2.0 comparison (click to expand)
v1.0 (Old) v2.0 (Current)
LLM ↔ Actions All 160+ commands in one prompt. LLM returns JSON. Malformed JSON β†’ repair β†’ retry. Native tool calling via /api/chat. Model picks tools directly. No JSON hacking.
Context Entire catalog every time (~4K tokens wasted). Smart filter β†’ only ~30 relevant commands. 80% fewer tokens.
Multi-step Sequential. 3 actions = 3x the time. Parallel. Independent actions run concurrently.
Web responses Spinner β†’ wait β†’ full text blob appears. Token-by-token streaming via SSE. Blinking cursor.
Model compat One prompting style. Switch model = rewrite. Auto-detects capabilities. Falls back gracefully.

πŸ›  What It Can Do

17 Built-In Plugins (160+ commands)

Category Plugin Highlights
πŸͺŸ Desktop windowManager Focus, minimize, maximize, resize, snap, tile windows
appLauncher Open any app by name β€” "open chrome", "launch vscode"
keyboardMouse Type text, hotkeys, mouse clicks, scroll, drag
βš™οΈ System processController List, kill, monitor processes β€” "kill chrome", "top processes"
systemInfo CPU, RAM, disk, battery, network stats, uptime
powerTools Shutdown, restart, sleep, hibernate, lock
volumeDisplay Volume up/down/mute, screen brightness
πŸ“ Files fileOps Create, read, write, delete, search, move files & folders
clipboard Copy, paste, clipboard history
shellExecutor Run any shell command with captured output
🌐 Web cometWebAgent Headless Playwright browser β€” navigate, click, extract, screenshot. Uses a ReAct loop for multi-step browsing.
πŸ€– AI coreBuilder The meta-plugin. Generates, validates, hot-loads, and auto-heals other plugins at runtime.
intelligence Proactive nudges, context probes, system health monitoring
smartActions Context-aware compound actions
πŸ“‹ Automation workflowEngine Chain multi-step workflows. LLM can generate workflow YAML from plain English.
taskScheduler Schedule recurring commands (cron-style)
screenIntel Screenshots, OCR text extraction, screen region capture

Self-Writing Plugins (the JARVIS part)

This is the killer feature. If a capability doesn't exist, Autocrat builds it on the spot.

You: "build plugin that fetches current weather for any city"

What happens behind the scenes:

 1. LLM generates a full NexusPlugin subclass (Python file)
 2. AST Validator scans for safety:
    βœ“ No eval/exec/os.system
    βœ“ No ctypes/winreg
    βœ“ Proper NexusPlugin structure
 3. Network Scanner detects URL: wttr.in
    ⚠️ Not in allowlist β†’ Security Prompt:
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Plugin 'weather_fetcher' wants to reach wttr.in β”‚
    β”‚  [1] Allow Once  [2] Allow Always  [3] Block    β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 4. You pick "Allow Always" β†’ domain saved to config
 5. Plugin is importlib-loaded into the live engine
 6. New commands registered immediately β€” no restart

Now you can say:

You:       "weather in Mumbai"
Autocrat:   🌍 Mumbai, India:
            🌑️  30Β°C (86Β°F) β€” feels like 34Β°C
            ☁️  Smoke
            πŸ’§ Humidity: 43%
            πŸ’¨ Wind: 21 km/h WNW

If the generated plugin crashes at runtime, the error traceback is sent back to the LLM, which patches the code and reloads it automatically.

Web Dashboard (your control center)

Start with python main.py --web and open http://127.0.0.1:9000:

  • Live terminal β€” type commands, get streamed responses, click autocomplete suggestions
  • System gauges β€” real-time CPU, RAM, disk, battery with animated arcs
  • Plugin explorer β€” browse all plugins, see every command, click to auto-fill
  • Command history β€” searchable log of everything you've run
  • Workflow builder β€” create and trigger multi-step automations
  • Confirmation alerts β€” destructive actions trigger a WebSocket popup for approval

Tunnel it with ngrok and you have a remote AI assistant for your PC accessible from any device.


πŸ”’ Safety

Nothing runs unless validated. Three layers of defense:

Layer Scope What it does
AST Sandbox Generated plugins Parses code before execution. Blocks eval, exec, ctypes, winreg, os.system, subprocess.Popen, shutil.rmtree. Verifies proper NexusPlugin structure.
Network Permissions Generated plugins Scans every URL/domain in code. Unapproved domains trigger Allow Once / Allow Always / Block prompt. No silent network access.
Destructive Action Gate All plugins shutdown, restart, kill, delete require explicit approval. Alert pushed to all clients (web, Telegram, VS Code).

Think of it like Android permissions, but for your desktop AI.


πŸ— Under The Hood

Folder Structure

Autocrat/
β”œβ”€β”€ main.py                     # Entry point (CLI + web server)
β”œβ”€β”€ nexus_config.yaml           # Master configuration
β”œβ”€β”€ requirements.txt            # Dependencies
β”‚
β”œβ”€β”€ nexus/
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ engine.py           # Command router + parallel executor
β”‚   β”‚   β”œβ”€β”€ ai_engine.py        # LLM integration (native tools + streaming)
β”‚   β”‚   β”œβ”€β”€ brain.py            # ML intent classifier (sentence-transformers)
β”‚   β”‚   β”œβ”€β”€ parser.py           # Regex command parser (200+ patterns)
β”‚   β”‚   β”œβ”€β”€ config.py           # YAML config manager
β”‚   β”‚   β”œβ”€β”€ events.py           # Event bus for cross-plugin communication
β”‚   β”‚   β”œβ”€β”€ learner.py          # Behavioral learning (time, chain, frequency)
β”‚   β”‚   β”œβ”€β”€ logger.py           # Structured logging
β”‚   β”‚   └── plugin.py           # Base plugin class
β”‚   β”‚
β”‚   β”œβ”€β”€ plugins/
β”‚   β”‚   β”œβ”€β”€ core_builder.py     # Meta-plugin: generates other plugins
β”‚   β”‚   β”œβ”€β”€ comet_web_agent.py  # Headless browser (Playwright + ReAct)
β”‚   β”‚   β”œβ”€β”€ workflow_engine.py  # Multi-step workflow orchestration
β”‚   β”‚   β”œβ”€β”€ generated/          # Auto-generated plugins land here
β”‚   β”‚   └── ...                 # 14 more built-in plugins
β”‚   β”‚
β”‚   β”œβ”€β”€ integrations/
β”‚   β”‚   └── telegram_bot.py     # Telegram remote control
β”‚   β”‚
β”‚   └── web/
β”‚       β”œβ”€β”€ server.py           # FastAPI + SSE streaming + WebSocket
β”‚       └── static/             # Web dashboard (HTML / CSS / JS)
β”‚
β”œβ”€β”€ workflows/                  # Saved workflow YAML files
β”œβ”€β”€ logs/                       # Runtime logs
└── screenshots/                # Captured screens

Configuration

ai:
  llm_backend: local_ollama
  local_model: qwen2.5-coder:3b
  local_base_url: http://127.0.0.1:11434
  use_native_tools: true # native Ollama tool calling
  strict_json_mode: true # JSON fallback for older models

system:
  safe_mode: false

safety:
  confirm_destructive: true
  web:
    allowlist_domains:
      - github.com
      - codeforces.com
      - wttr.in
      - localhost

Domains get added automatically when you approve "Allow Always" through the security prompt.

Requirements

What Why
Python 3.10+ Modern syntax, type hints
Ollama Local LLM (qwen2.5-coder:3b)
Windows 10/11 Win32 system automation APIs
~2GB RAM Sentence-transformer + Ollama overhead

Key packages: fastapi Β· uvicorn Β· httpx Β· sentence-transformers Β· playwright Β· psutil Β· pyautogui Β· pycaw Β· google-generativeai (optional)

Why not LangChain?

  • +50MB deps for stuff Autocrat already does natively
  • Opaque wrappers β€” a tool call fails 4 layers deep, good luck debugging
  • Ollama already has a tool-calling API β€” wrapping it again adds latency, not features

We built the four things that actually matter (native tools, smart filtering, parallel exec, streaming) in ~500 lines. Zero new dependencies.


πŸ—Ί Roadmap

  • Native LLM tool calling (Ollama /api/chat)
  • Smart context window (keyword-relevance filtering)
  • Parallel multi-step execution
  • Streaming responses (SSE)
  • Auto-healing generated plugins
  • Dynamic network permissions
  • 🎀 Voice control (faster-whisper β€” "Hey Autocrat, lock my PC")
  • 🐧 Linux / macOS support (replace Win32 APIs)
  • πŸͺ Plugin marketplace (share generated plugins with others)
  • πŸ€– Multi-agent mode (agents that spawn sub-agents)
  • πŸ’» VS Code extension (run commands inline)
  • 🧠 Persistent memory (remember preferences across sessions)

🀝 Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

πŸ“„ License

MIT β€” See LICENSE.


Built by @Autocrat2005

If this project is useful, consider giving it a ⭐

"Sir, I've prepared a flight plan..." β€” Well, not yet. But we're getting there.

About

Autocrat CLI is a modular command-line interface designed for automation and task management. It features plugin-based architecture, workflow generation, and system integration, making it a versatile tool for developers and power users.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors