Gemini 1.5 PRO latest + CEDARScript-G edit format#1897
Closed
elifarley wants to merge 6 commits intoAider-AI:mainfrom
Closed
Gemini 1.5 PRO latest + CEDARScript-G edit format#1897elifarley wants to merge 6 commits intoAider-AI:mainfrom
elifarley wants to merge 6 commits intoAider-AI:mainfrom
Conversation
Contributor
|
What is the point of this PR? The coder does not exist in aider currently. These numbers are at best for private preview interest, not for public disclosure on the aider website (IMHO). |
Author
|
Ok, I'll make it a draft PR. Once a PR in Aider is created and merged, I can then make this PR ready for review once more. |
Contributor
I'll close this PR until this happened. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The new CEDARScript edit format looks promising, as it allowed Gemini-1.5-Flash to surpass Sonnet 3.5.
Here we're not using architect mode, but you can kinda say that Gemini is acting as an architect, and the edit format itself (
CEDARScript) is acting as the editor.Quick comparisons
Sonnet 3.5 +
diffGemini 1.5 PRO +
diff-fenced(leaderboard site)Gemini 1.5 PRO +
diff-fenced(my own tests)Gemini 1.5 PRO +
CEDARScriptGemini 1.5 Flash +
CEDARScriptfunctional_Functional__conform_to_reference_input
diff-fenced
cedarscript-g
See line count comparisons for some refactoring benchmark tasks.

Analysis: CEDARScript vs. Common Edit Formats in AI-Assisted Code Refactoring
The introduction of
CEDARScriptas an edit format for AI-assisted code refactoring has demonstrated an important leap in performance, particularly when used with Gemini 1.5 PRO and Gemini 1.5 Flash. This analysis compares CEDARScript against traditional diff-based edit formats, revealing striking improvements across multiple metrics.Overall Performance:
CEDARScript has dramatically enhanced the performance of Gemini models in code refactoring tasks. When paired with Gemini 1.5 PRO, it achieved an impressive 77.5% pass rate and 86.5% well-formed cases, significantly outperforming both its own diff-fenced format results (49.4% pass rate, 7.9% well-formed cases) and the highly regarded Claude 3.5 Sonnet (64.0% pass rate, 76.4% well-formed cases).
Most remarkably, the cost-effective Gemini 1.5 Flash model, when using CEDARScript, not only matched but surpassed the performance of Claude 3.5 Sonnet. With a 76.4% pass rate and an outstanding 94.4% well-formed cases, Gemini 1.5 Flash demonstrates that even a more affordable model can outperform top-tier competitors when equipped with the right tools. This breakthrough suggests that CEDARScript can level the playing field, enabling more accessible AI models to compete with and even exceed the capabilities of more expensive options in complex coding tasks.
Code Quality and Accuracy:
These improvements suggest that CEDARScript enables AI models to produce more accurate, syntactically correct, and well-structured code modifications.
Efficiency and Resource Utilization:
Examining the "functional_Functional__conform_to_reference_input" test case:
On a larger scale, CEDARScript with Gemini 1.5 PRO reduced the average time per case from 110.1 seconds to 29.0 seconds, a 73.7% improvement. Gemini 1.5 Flash further reduced this to 14.7 seconds, an 86.6% improvement over the original diff-fenced format.
Robustness and Reliability:
While the number of error outputs increased with CEDARScript, the number of malformed responses decreased significantly:
This suggests that while CEDARScript may generate more error outputs, it produces fewer malformed responses, potentially indicating more precise error handling and feedback.
Scalability and Cost-Effectiveness:
CEDARScript demonstrated impressive cost savings:
This cost reduction, combined with faster processing times, indicates excellent scalability for larger, more complex refactoring tasks.
Model Comparison:
Gemini 1.5 Flash with CEDARScript showed slightly lower pass rates (76.4% vs 77.5%) but higher well-formed case percentages (94.4% vs 86.5%) compared to Gemini 1.5 PRO. The Flash model also demonstrated superior cost-effectiveness and speed, making it an attractive option for many use cases.
Conclusion:
CEDARScript has shown significant improvements for AI-assisted code refactoring.
By improving cost-savings, accuracy, efficiency, and reliability across different models, it addresses many of the challenges associated with traditional diff-based formats.
The consistent performance boost across various metrics indicates that CEDARScript could be an important enabler for AI models to handle complex code transformations more effectively.
These results could have positive implications for developer productivity, code quality, and the future of AI-assisted software development.