Description
Hi, I am really impressed by the Agentless work. I think it is novel, intuitive and clean. However, when I dig in the code and pipeline, I found a few limitations in the current pipeline.
Compare to all the other Agentic methods, Agentless did not use specific tools or interactive pipelines. Instead Agentless use a hierarchical method to gradually locate file , then function/class/vars and then lines to collect fault context. Once it has all the context, then the model can start to generate the fix patch.
However, I believe in each step of the fault localization, the information is incomplete/limited to support its localization.
- For file localization, the input is the problem description, and the repo structure rendered in a directory tree format. The model is expected to output potential files to modify. In my opinion, the information is very limited since the file names can be uninformative.
- For function/class/var localization, currently the only supported option is to provide problem description and compressed skeleton code. Without knowing the actual implementations in each class/function, how could the model know which function/class to modify if only names are provided?
- Now that classes/functions/vars are collected, some context windows will be constructed and feed into model for patch generation. However, what if the patch requires adding import, helper functions/ global variables. These context will not be retrieved from step 2 and models are constrained to generate patch based on retrieved contexts ...
I think maybe toosl are something ugly but we have to use ? The top-down static method that Agentless currently using seems limited. I have been thinking hard on this, but I think to collect enough repo context, static pipeline seems very infeasible at least to me..
Wonder what you guys think of this .. ??