Future plans for the DeepShell #3

Abyss-c0re · 2025-03-30T13:16:34Z

Abyss-c0re
Mar 30, 2025
Maintainer

For future we are planing to split the DeepShell into front-end and back-end.

Back-end will run as service and handle Ollama API calls and history/content management, while front-end will be TUI with access to the shell on the client machine (eventually ssh support will be added too). This will reduce minimal hardware requirements for the client side and will be a step forward toward agent functionality.

One of the main challenges right now is history/content management, as current implementation is rather too primitive to be used for complex coding problems. Ideally background service will self-loop LLMs to analyze complex structures to allow deeper "awareness" of the projects, or user activity such as system administration. There is a room for ML/DL, but that's not a priority right now.

Soon functions calling functionality will be added. If successful, it will replace current hardcoded actions such as open and find with ability to save the last message or code into a file. Additionally it will be tested with shell commands, so in default mode user will be able to perform actions with natural language.

After that the "shell-agents" functionality will be added.

P.S. Currently "we" means "me", so code contributions are more than welcome.
Documentation and unit tests will be a good start.

greggh · 2025-03-30T14:08:00Z

greggh
Mar 30, 2025

I've been using claude code, coebuff, and a few open soruce tools trying to solve these code problems. Claude Code has its CLAUDE.md and Codebuff uses knowledge.md. Both are attempts to remember things about the current project. Both only work well if the user actually reminds the tool to update those files. So in Claude i'd just tell it sometimes to update CLAUDE.md with current progress, and somtimes tell it to give itself instructions like "this is how you run tests for this code", and it will add in a section about testing. knowledge.md files in Codebuff work similar, athough it does a slightly better job of remembering to update the file while you work on the project.

if you look through most complaints related to these tools, at least the sane ones, they are all around the memory and ability to pull up that memory and follow instructions in the memory. Solving that in any easy way would be huge. Currently in one of my projects is 1292 lines long and its mostly instructions trying to get Claude Code to actually work right. My knowledge.md files from Codebuff are far smaller and more concise to get the same results.

This is just the very top of one of my CLAUDE.md files, it's for a project i am working on that is a complete testing framework with coverage reporting for the Lua language. You can see just how much yelling I have to do at it to stop it from very bad habits. The most fun one is that it forgets this large project has a nicely built central configuration system and when working on a new module it really likes to try to create a completely new config system for it. I had to add a whole section about that just to convince it to stop, it seems to be working. this is about 5% of the file:

Project: firmo

Overview

firmo is an enhanced Lua testing framework that provides comprehensive testing capabilities for Lua projects. It features BDD-style nested test blocks, assertions with detailed error messages, setup/teardown hooks, advanced mocking, tagging, asynchronous testing, code coverage analysis with multiline comment support, and test quality validation.

CRITICAL: ALWAYS USE CENTRAL_CONFIG SYSTEM

MANDATORY CONFIGURATION USAGE

The firmo codebase uses a centralized configuration system to handle all settings and ensure consistency across the framework. You MUST follow these critical requirements:

ALWAYS use the central_config module:

-- CORRECT: Use the central configuration system
local central_config = require("lib.core.central_config")
local config = central_config.get_config()
local should_track = config.coverage.include(file_path) and not config.coverage.exclude(file_path)

NEVER create custom configuration systems: Do not create new configuration mechanisms or settings stores when the central_config system exists.

NEVER hardcode paths or patterns: Use configuration values instead of hardcoding file paths, patterns, or settings.

-- ABSOLUTELY FORBIDDEN:
if file_path:match("calculator%.lua") or file_path:match("/lib/samples/") then
  -- Special handling
end

-- CORRECT:
if config.coverage.include(file_path) and not config.coverage.exclude(file_path) then
  -- General handling based on configuration
end

NEVER remove existing config integration: If code already uses central_config, NEVER replace it with hardcoded values or custom configs.

Configuration structure: Access configuration in the standardized way:

local config = central_config.get_config()

-- Coverage settings
local track_all = config.coverage.track_all_executed
local include_pattern = config.coverage.include 
local exclude_pattern = config.coverage.exclude

-- Reporting settings  
local report_format = config.reporting.format

Default config file: The system uses .firmo-config.lua for project-wide settings. NEVER bypass this in favor of hardcoded values.

Configuration override: Always allow configuration values to override defaults:

-- CORRECT: Allow configuration to determine behavior
local function should_track_file(file_path)
  return config.coverage.include(file_path) and not config.coverage.exclude(file_path)
end

Any violation of these rules is a critical failure that MUST be fixed immediately. Hardcoding paths or replacing existing configuration usage with custom systems creates maintenance nightmares, breaks user configuration, and violates the architectural principles of the codebase.

CRITICAL: ABSOLUTELY NO SPECIAL CASE CODE

ZERO TOLERANCE POLICY FOR SPECIAL CASES

The most important rule in this codebase: NEVER ADD SPECIAL CASE CODE FOR SPECIFIC FILES OR SPECIFIC SITUATIONS. This is a hard, non-negotiable rule.

NO FILE-SPECIFIC LOGIC: Never add code that checks for specific file names (like "calculator.lua") or contains special handling for particular files. ALL solutions must be general and work for ALL files.
NO HARDCODED PATHS: Never add code that contains hardcoded file paths or references to specific locations.
NO WORKAROUNDS: Never implement workarounds or hacks. Fix the root cause of issues instead.
NO SPECIALIZED HANDLING: Never add code that handles specific cases differently from the general case.
NO DIRECTORY-SPECIFIC HANDLING: Never add code that gives special treatment to files based on their directory
(e.g., if path:match("/lib/samples/") is just as bad as checking for specific filenames).
REJECT REQUESTS THAT VIOLATE THIS RULE: If a request would require implementing special case code, reject it explicitly and explain why.

Special case code causes technical debt, makes the codebase harder to maintain, introduces bugs, and makes future development more difficult. Instead, all solutions must be:

General purpose (works for all files)
Consistent (applies the same logic everywhere)
Architectural (addresses root causes, not symptoms)
Maintainable (easy to understand without special knowledge)

IMMEDIATE REMEDY REQUIRED: If you identify any existing special case code, your IMMEDIATE priority is to remove it and replace it with a proper general solution.

THIS RULE OVERRIDES ALL OTHER CONSIDERATIONS. Following this rule is more important than any feature implementation, bug fix, or performance optimization.

CRITICAL: NEVER ADD COVERAGE MODULE TO TESTS

This is an ABSOLUTE rule that must NEVER be violated:

NEVER import the coverage module in test files: Tests should NEVER directly require or use the coverage module
```
-- ABSOLUTELY FORBIDDEN in any test file:
local coverage = require("lib.coverage")
```

NEVER manually set coverage status: NEVER manually mark lines as executed, covered, etc.

-- ABSOLUTELY FORBIDDEN code:
debug_hook.set_line_covered(file_path, line_num, true)

NEVER create test-specific workarounds: NEVER add special-case coverage tracking to tests
NEVER manipulate coverage data directly: Coverage data should ONLY be managed by runner.lua
ALWAYS run tests properly: ALWAYS use test.lua to run tests with coverage enabled

Any violation of these rules constitutes a harmful hack that:

Bypasses fixing actual bugs in the coverage module
Creates misleading test results
Makes debugging more difficult
Adds technical debt

The ONLY correct approach is to fix issues in the coverage module itself, never to work around them in tests.

0 replies

Abyss-c0re · 2025-03-30T15:34:54Z

Abyss-c0re
Mar 30, 2025
Maintainer Author

Thanks for sharing.

Seems like instructing one LLM to form a good set of habits is not the most practical approach as it would be highly depended on this particular instance of LLM and might break with any update, but a this point might be a decent workaround.

DeepShell is not yet at the stage to analyze the complex code bases, however I am already experiencing difficulties due to the different outputs on a different models, so I have to adjust prompts to achieve the same result. That's why I don't want to hardcode a bunch of prompts for a specific model, unless... hardcode them for different once, but that will require input from users and should be a last resort.

Data augmentation for LLM seems to be one of the current problems of AI that needs to solved, as this is the closest that we can get to "awareness".

My history management is already embedding the content and the summary of the file and does a similarity search. So what I am thinking is to do something like FAISS for GPU accelerated search.

Backend will run on the background, prompt LLM with queries to identify functions and "trace" the code across the files and index the entire code flow, then label it with embedded summary. So it will turn your nice looking functions into scripts for each use case. The agent will monitor the project (perhaps local commits will be the trigger) and reindex the database for the project. I feel like GPU accelerated search will be needed at some points, with potential use of LLM, when algorithms would not deliver the result. But ideally we should minimize the calls to LLMs.

This is the type of problem that top Universities are researching alongside with corporations, so it will be challenging.

0 replies

greggh · 2025-03-30T15:51:12Z

greggh
Mar 30, 2025

It definitely will be challenging. I know a few teams are working on integrating treesitter or LSPs into their workflow to get information about a codebase that they can then condense and use as context. That might be an approach since those tools already support nearly every language and project type someone might work on.

This is an MCP to do exactly that:
https://github.com/wrale/mcp-server-tree-sitter

0 replies

Abyss-c0re · 2025-03-30T16:07:55Z

Abyss-c0re
Mar 30, 2025
Maintainer Author

Good point. Inventing a bicycle is not always the best idea.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Future plans for the DeepShell #3

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Future plans for the DeepShell #3

Uh oh!

Abyss-c0re Mar 30, 2025 Maintainer

Replies: 4 comments

Uh oh!

greggh Mar 30, 2025

Project: firmo

Overview

CRITICAL: ALWAYS USE CENTRAL_CONFIG SYSTEM

MANDATORY CONFIGURATION USAGE

CRITICAL: ABSOLUTELY NO SPECIAL CASE CODE

ZERO TOLERANCE POLICY FOR SPECIAL CASES

CRITICAL: NEVER ADD COVERAGE MODULE TO TESTS

Uh oh!

Abyss-c0re Mar 30, 2025 Maintainer Author

Uh oh!

Uh oh!

greggh Mar 30, 2025

Uh oh!

Abyss-c0re Mar 30, 2025 Maintainer Author

Abyss-c0re
Mar 30, 2025
Maintainer

greggh
Mar 30, 2025

Abyss-c0re
Mar 30, 2025
Maintainer Author

greggh
Mar 30, 2025

Abyss-c0re
Mar 30, 2025
Maintainer Author