One prompt. Full system control. It builds the tools it doesn't have.
See It In Action Β· Quick Start Β· How It Thinks Β· What It Can Do Β· Architecture
Autocrat understands you like an assistant, not a terminal.
You: "close spotify, mute the volume, and take a screenshot"
Autocrat: β Killed process: Spotify.exe
β Volume muted
β Screenshot saved β screenshots/capture_20260301_143022.png
β± 340ms (3 actions, parallel)
You: "build a plugin that monitors my CPU temperature and alerts me when it crosses 80Β°C"
Autocrat: π¨ Generating plugin: cpu_temp_monitor
π AST validation passed (no unsafe patterns)
β
Plugin loaded β 2 new commands registered
β Try: "check cpu temp" or "set temp alert 80"
You: "go to github.com/trending and tell me the top 3 repos today"
Autocrat: π Launching headless browser...
π Navigating to github.com/trending
π Extracting trending repositories...
1. google/gemma-3 β Lightweight open model by Google DeepMind
2. ollama/ollama β Get up and running with large language models
3. microsoft/TypeScript β TypeScript language and compiler
β± 2.4s
You (from phone, via ngrok): "lock my pc"
Autocrat: β οΈ Destructive action: lock workstation
Approve? [yes/no]
You: "yes"
Autocrat: β Workstation locked.
You: "create a folder called 'DailyReport', take a screenshot, save it there,
then write a text file with today's CPU and RAM usage"
Autocrat: β Created folder: DailyReport
β Screenshot β DailyReport/screen_20260301.png
β System stats written β DailyReport/stats_20260301.txt
β± 580ms (workflow: 4 steps)
You: "explain the difference between threads and processes in python"
Autocrat: A process is an independent program with its own memory space...
[streams token-by-token with a blinking cursor]
# 1. Clone & install
git clone https://github.com/Autocrat2005/Autocrat.git
cd Autocrat
python -m venv .venv && .venv\Scripts\activate
pip install -r requirements.txt
# 2. Pull the brain
ollama pull qwen2.5-coder:3b
ollama serve # keep running in background
# 3. Launch your assistant
python main.py # CLI β talk in the terminal
python main.py --web # Web dashboard β http://127.0.0.1:9000ngrok http 9000 # tunnel it
# β Open the ngrok URL on your phone, tablet, or another PCYou now have a JARVIS-style AI controlling your desktop from any device on earth.
Every input goes through four layers. The fastest one that understands you wins β the rest don't even fire.
"open chrome"
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 1: Regex Parser < 1ms β
β 200+ hand-tuned patterns. Instant recognition. β
β "open chrome" β appLauncher.launch(name="chrome") β
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β (only if Stage 1 didn't match)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 2: ML Brain ~ 5ms β
β Sentence-transformer (all-MiniLM-L6-v2). β
β Semantic similarity against learned intents. β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β (only if Stage 2 confidence < threshold)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 3: LLM (Ollama) ~ 1-3s β
β Smart context filter β 30 most relevant commands. β
β Native tool calling β model picks function + params. β
β "play music in my downloads folder" β complex mapping. β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β (only if no action matched)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 4: Conversational ~ 1-3s β
β General knowledge. "What's a mutex?" β answer streams β
β token-by-token to the web UI via SSE. β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
90% of commands resolve in Stage 1 or 2 β under 10ms. The LLM is a smart fallback, not the bottleneck.
When the LLM does fire, it's not dumb prompt engineering. It's proper agent-style tool use:
Input: "close spotify and take a screenshot"
β
ββ Smart Context Window
β 160+ commands β keyword scoring + synonyms β top 30 relevant sent
β
ββ Native Tool Calling (Ollama /api/chat)
β Model receives structured function definitions
β Returns: tool_call(processController.kill, {name: "spotify"})
β + tool_call(screenIntel.screenshot, {})
β
ββ Parallel Executor
Different plugins β fire concurrently (ThreadPoolExecutor)
Both finish in ~200ms instead of ~400ms sequentially
v1.0 β v2.0 comparison (click to expand)
| v1.0 (Old) | v2.0 (Current) | |
|---|---|---|
| LLM β Actions | All 160+ commands in one prompt. LLM returns JSON. Malformed JSON β repair β retry. | Native tool calling via /api/chat. Model picks tools directly. No JSON hacking. |
| Context | Entire catalog every time (~4K tokens wasted). | Smart filter β only ~30 relevant commands. 80% fewer tokens. |
| Multi-step | Sequential. 3 actions = 3x the time. | Parallel. Independent actions run concurrently. |
| Web responses | Spinner β wait β full text blob appears. | Token-by-token streaming via SSE. Blinking cursor. |
| Model compat | One prompting style. Switch model = rewrite. | Auto-detects capabilities. Falls back gracefully. |
| Category | Plugin | Highlights |
|---|---|---|
| πͺ Desktop | windowManager | Focus, minimize, maximize, resize, snap, tile windows |
| appLauncher | Open any app by name β "open chrome", "launch vscode" | |
| keyboardMouse | Type text, hotkeys, mouse clicks, scroll, drag | |
| βοΈ System | processController | List, kill, monitor processes β "kill chrome", "top processes" |
| systemInfo | CPU, RAM, disk, battery, network stats, uptime | |
| powerTools | Shutdown, restart, sleep, hibernate, lock | |
| volumeDisplay | Volume up/down/mute, screen brightness | |
| π Files | fileOps | Create, read, write, delete, search, move files & folders |
| clipboard | Copy, paste, clipboard history | |
| shellExecutor | Run any shell command with captured output | |
| π Web | cometWebAgent | Headless Playwright browser β navigate, click, extract, screenshot. Uses a ReAct loop for multi-step browsing. |
| π€ AI | coreBuilder | The meta-plugin. Generates, validates, hot-loads, and auto-heals other plugins at runtime. |
| intelligence | Proactive nudges, context probes, system health monitoring | |
| smartActions | Context-aware compound actions | |
| π Automation | workflowEngine | Chain multi-step workflows. LLM can generate workflow YAML from plain English. |
| taskScheduler | Schedule recurring commands (cron-style) | |
| screenIntel | Screenshots, OCR text extraction, screen region capture |
This is the killer feature. If a capability doesn't exist, Autocrat builds it on the spot.
You: "build plugin that fetches current weather for any city"
What happens behind the scenes:
1. LLM generates a full NexusPlugin subclass (Python file)
2. AST Validator scans for safety:
β No eval/exec/os.system
β No ctypes/winreg
β Proper NexusPlugin structure
3. Network Scanner detects URL: wttr.in
β οΈ Not in allowlist β Security Prompt:
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Plugin 'weather_fetcher' wants to reach wttr.in β
β [1] Allow Once [2] Allow Always [3] Block β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
4. You pick "Allow Always" β domain saved to config
5. Plugin is importlib-loaded into the live engine
6. New commands registered immediately β no restart
Now you can say:
You: "weather in Mumbai"
Autocrat: π Mumbai, India:
π‘οΈ 30Β°C (86Β°F) β feels like 34Β°C
βοΈ Smoke
π§ Humidity: 43%
π¨ Wind: 21 km/h WNW
If the generated plugin crashes at runtime, the error traceback is sent back to the LLM, which patches the code and reloads it automatically.
Start with python main.py --web and open http://127.0.0.1:9000:
- Live terminal β type commands, get streamed responses, click autocomplete suggestions
- System gauges β real-time CPU, RAM, disk, battery with animated arcs
- Plugin explorer β browse all plugins, see every command, click to auto-fill
- Command history β searchable log of everything you've run
- Workflow builder β create and trigger multi-step automations
- Confirmation alerts β destructive actions trigger a WebSocket popup for approval
Tunnel it with ngrok and you have a remote AI assistant for your PC accessible from any device.
Nothing runs unless validated. Three layers of defense:
| Layer | Scope | What it does |
|---|---|---|
| AST Sandbox | Generated plugins | Parses code before execution. Blocks eval, exec, ctypes, winreg, os.system, subprocess.Popen, shutil.rmtree. Verifies proper NexusPlugin structure. |
| Network Permissions | Generated plugins | Scans every URL/domain in code. Unapproved domains trigger Allow Once / Allow Always / Block prompt. No silent network access. |
| Destructive Action Gate | All plugins | shutdown, restart, kill, delete require explicit approval. Alert pushed to all clients (web, Telegram, VS Code). |
Think of it like Android permissions, but for your desktop AI.
Autocrat/
βββ main.py # Entry point (CLI + web server)
βββ nexus_config.yaml # Master configuration
βββ requirements.txt # Dependencies
β
βββ nexus/
β βββ core/
β β βββ engine.py # Command router + parallel executor
β β βββ ai_engine.py # LLM integration (native tools + streaming)
β β βββ brain.py # ML intent classifier (sentence-transformers)
β β βββ parser.py # Regex command parser (200+ patterns)
β β βββ config.py # YAML config manager
β β βββ events.py # Event bus for cross-plugin communication
β β βββ learner.py # Behavioral learning (time, chain, frequency)
β β βββ logger.py # Structured logging
β β βββ plugin.py # Base plugin class
β β
β βββ plugins/
β β βββ core_builder.py # Meta-plugin: generates other plugins
β β βββ comet_web_agent.py # Headless browser (Playwright + ReAct)
β β βββ workflow_engine.py # Multi-step workflow orchestration
β β βββ generated/ # Auto-generated plugins land here
β β βββ ... # 14 more built-in plugins
β β
β βββ integrations/
β β βββ telegram_bot.py # Telegram remote control
β β
β βββ web/
β βββ server.py # FastAPI + SSE streaming + WebSocket
β βββ static/ # Web dashboard (HTML / CSS / JS)
β
βββ workflows/ # Saved workflow YAML files
βββ logs/ # Runtime logs
βββ screenshots/ # Captured screens
ai:
llm_backend: local_ollama
local_model: qwen2.5-coder:3b
local_base_url: http://127.0.0.1:11434
use_native_tools: true # native Ollama tool calling
strict_json_mode: true # JSON fallback for older models
system:
safe_mode: false
safety:
confirm_destructive: true
web:
allowlist_domains:
- github.com
- codeforces.com
- wttr.in
- localhostDomains get added automatically when you approve "Allow Always" through the security prompt.
| What | Why |
|---|---|
| Python 3.10+ | Modern syntax, type hints |
| Ollama | Local LLM (qwen2.5-coder:3b) |
| Windows 10/11 | Win32 system automation APIs |
| ~2GB RAM | Sentence-transformer + Ollama overhead |
Key packages: fastapi Β· uvicorn Β· httpx Β· sentence-transformers Β· playwright Β· psutil Β· pyautogui Β· pycaw Β· google-generativeai (optional)
- +50MB deps for stuff Autocrat already does natively
- Opaque wrappers β a tool call fails 4 layers deep, good luck debugging
- Ollama already has a tool-calling API β wrapping it again adds latency, not features
We built the four things that actually matter (native tools, smart filtering, parallel exec, streaming) in ~500 lines. Zero new dependencies.
- Native LLM tool calling (Ollama
/api/chat) - Smart context window (keyword-relevance filtering)
- Parallel multi-step execution
- Streaming responses (SSE)
- Auto-healing generated plugins
- Dynamic network permissions
- π€ Voice control (faster-whisper β "Hey Autocrat, lock my PC")
- π§ Linux / macOS support (replace Win32 APIs)
- πͺ Plugin marketplace (share generated plugins with others)
- π€ Multi-agent mode (agents that spawn sub-agents)
- π» VS Code extension (run commands inline)
- π§ Persistent memory (remember preferences across sessions)
Contributions welcome! See CONTRIBUTING.md for guidelines.
MIT β See LICENSE.
Built by @Autocrat2005
If this project is useful, consider giving it a β
"Sir, I've prepared a flight plan..." β Well, not yet. But we're getting there.