Future plans for the DeepShell #3
Replies: 4 comments
-
|
I've been using claude code, coebuff, and a few open soruce tools trying to solve these code problems. Claude Code has its CLAUDE.md and Codebuff uses knowledge.md. Both are attempts to remember things about the current project. Both only work well if the user actually reminds the tool to update those files. So in Claude i'd just tell it sometimes to update CLAUDE.md with current progress, and somtimes tell it to give itself instructions like "this is how you run tests for this code", and it will add in a section about testing. knowledge.md files in Codebuff work similar, athough it does a slightly better job of remembering to update the file while you work on the project. if you look through most complaints related to these tools, at least the sane ones, they are all around the memory and ability to pull up that memory and follow instructions in the memory. Solving that in any easy way would be huge. Currently in one of my projects is 1292 lines long and its mostly instructions trying to get Claude Code to actually work right. My knowledge.md files from Codebuff are far smaller and more concise to get the same results. This is just the very top of one of my CLAUDE.md files, it's for a project i am working on that is a complete testing framework with coverage reporting for the Lua language. You can see just how much yelling I have to do at it to stop it from very bad habits. The most fun one is that it forgets this large project has a nicely built central configuration system and when working on a new module it really likes to try to create a completely new config system for it. I had to add a whole section about that just to convince it to stop, it seems to be working. this is about 5% of the file: Project: firmoOverviewfirmo is an enhanced Lua testing framework that provides comprehensive testing capabilities for Lua projects. It features BDD-style nested test blocks, assertions with detailed error messages, setup/teardown hooks, advanced mocking, tagging, asynchronous testing, code coverage analysis with multiline comment support, and test quality validation. CRITICAL: ALWAYS USE CENTRAL_CONFIG SYSTEMMANDATORY CONFIGURATION USAGEThe firmo codebase uses a centralized configuration system to handle all settings and ensure consistency across the framework. You MUST follow these critical requirements:
Any violation of these rules is a critical failure that MUST be fixed immediately. Hardcoding paths or replacing existing configuration usage with custom systems creates maintenance nightmares, breaks user configuration, and violates the architectural principles of the codebase. CRITICAL: ABSOLUTELY NO SPECIAL CASE CODEZERO TOLERANCE POLICY FOR SPECIAL CASESThe most important rule in this codebase: NEVER ADD SPECIAL CASE CODE FOR SPECIFIC FILES OR SPECIFIC SITUATIONS. This is a hard, non-negotiable rule.
Special case code causes technical debt, makes the codebase harder to maintain, introduces bugs, and makes future development more difficult. Instead, all solutions must be:
IMMEDIATE REMEDY REQUIRED: If you identify any existing special case code, your IMMEDIATE priority is to remove it and replace it with a proper general solution. THIS RULE OVERRIDES ALL OTHER CONSIDERATIONS. Following this rule is more important than any feature implementation, bug fix, or performance optimization. CRITICAL: NEVER ADD COVERAGE MODULE TO TESTSThis is an ABSOLUTE rule that must NEVER be violated:
Any violation of these rules constitutes a harmful hack that:
The ONLY correct approach is to fix issues in the coverage module itself, never to work around them in tests. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for sharing. Seems like instructing one LLM to form a good set of habits is not the most practical approach as it would be highly depended on this particular instance of LLM and might break with any update, but a this point might be a decent workaround. DeepShell is not yet at the stage to analyze the complex code bases, however I am already experiencing difficulties due to the different outputs on a different models, so I have to adjust prompts to achieve the same result. That's why I don't want to hardcode a bunch of prompts for a specific model, unless... hardcode them for different once, but that will require input from users and should be a last resort. Data augmentation for LLM seems to be one of the current problems of AI that needs to solved, as this is the closest that we can get to "awareness". My history management is already embedding the content and the summary of the file and does a similarity search. So what I am thinking is to do something like FAISS for GPU accelerated search. Backend will run on the background, prompt LLM with queries to identify functions and "trace" the code across the files and index the entire code flow, then label it with embedded summary. So it will turn your nice looking functions into scripts for each use case. The agent will monitor the project (perhaps local commits will be the trigger) and reindex the database for the project. I feel like GPU accelerated search will be needed at some points, with potential use of LLM, when algorithms would not deliver the result. But ideally we should minimize the calls to LLMs. This is the type of problem that top Universities are researching alongside with corporations, so it will be challenging. |
Beta Was this translation helpful? Give feedback.
-
|
It definitely will be challenging. I know a few teams are working on integrating treesitter or LSPs into their workflow to get information about a codebase that they can then condense and use as context. That might be an approach since those tools already support nearly every language and project type someone might work on. This is an MCP to do exactly that: |
Beta Was this translation helpful? Give feedback.
-
|
Good point. Inventing a bicycle is not always the best idea. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
For future we are planing to split the DeepShell into front-end and back-end.
Back-end will run as service and handle Ollama API calls and history/content management, while front-end will be TUI with access to the shell on the client machine (eventually ssh support will be added too). This will reduce minimal hardware requirements for the client side and will be a step forward toward agent functionality.
One of the main challenges right now is history/content management, as current implementation is rather too primitive to be used for complex coding problems. Ideally background service will self-loop LLMs to analyze complex structures to allow deeper "awareness" of the projects, or user activity such as system administration. There is a room for ML/DL, but that's not a priority right now.
Soon functions calling functionality will be added. If successful, it will replace current hardcoded actions such as open and find with ability to save the last message or code into a file. Additionally it will be tested with shell commands, so in default mode user will be able to perform actions with natural language.
After that the "shell-agents" functionality will be added.
P.S. Currently "we" means "me", so code contributions are more than welcome.
Documentation and unit tests will be a good start.
Beta Was this translation helpful? Give feedback.
All reactions