Replies: 2 comments
-
|
The generate/refute split makes sense. The quality of the criticism step depends on how clearly the original intention was stated. If the AI's task came from a structured prompt with explicit objective, constraints, and success criteria, the critic has a clear target. You're checking whether the output satisfied the stated objective rather than guessing at intent. Free-text prompts produce outputs with implicit intent. Hard to refute something precisely when you don't know exactly what was asked for. Been building flompt (github.com/Nyrok/flompt) around this idea, 12 semantic block types that make the original intention explicit before any output is generated. |
Beta Was this translation helpful? Give feedback.
-
For many applications (especially exploratory work, prototypes) - we may not have a very clear "specification" as such. Specifications range from "soft" -- like a few iterations of chat with an AI vs "hard" -- full TLA+ style invariant and model specifications. And there are many "specifications" that fall in between depending on problem domain, engineer experience/capability, etc. At the moment, I am thinking of the "chat session for the change" as the source of the specification, and want to store it as part of the commit in some useful way. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
In The future belongs to those who can refute AI, not just generate with AI, I argued that the value in software engineering has shifted more from “generation” or “conjecture” to “review” or “critcism” or “refutation”. The argument was well-received in both dev.to (in weekly Top 7) and Hacker News (46 points and pointers from veterans).
In Should Code Review Live Inside Git?, I argued that we must bring in the criticism part, somehow into one of the most important pieces of tech most software devs rely on: git.
Today, I spent more time, giving strenth to the idea, fleshing out various details of the evolving situation. Here is how I see the whole thing now in a simple diagram:
The new system may have three distinct states of activity an engineer may engage in. For stickiness - let’s call it the Triple C System for Development:
Conjecture: As argued in the previous articles - GenAI is a conjecture engine. It can spit huge blocks of plausible code that’s supposed to fulfill the requirements specified in the preceding iterations of prompts. The prompts are an interactive manifestation of what we usually call The Specification. The result of specification → prompts → GenAI is a diff. We have some change in the system proposed by the system.
Criticism: Now the engineer swithces into a totally different mode. We are aiming for review, refutation, and a sincere effort to find holes in the newly proposed changes. It also involves clearly summarizing the changes practically made in the diff, and comparing it to the hopes expressed in The Specification. We can call this the Implementation Story. We are primarily looking for Issues - that require further consideration. We take these issues, go back to step (1), fixing things up, and refining it. We may iterate or cycle between Conjecture and Critcism until we are satisfied that the remaining issues are no longer of any big concern.
Commit: Now comes the time to put the entire effort to permanent record: we have an attestation, tracking the number of reviews, overall % of code inspected, state of the review (ran, skipped, vouched), and so on. And this is also - where my new proposals come in. Each commit must include LLM generated plus optionally human annotated Implementation Story as an “.lrc.md” file. When someone looks at a particular commit in Github or Gitlab, they must see two parts:
Essentially - we are aiming at a git history such that every significant commit consists of semi automated context about the specification, prompts, implementation summary, story, risks and opportunities. The context grows alongside the code.
The advantages of committing both conjecture and criticism (context as a whole) into git is that - later this can be used to track the evolution of the system. Humans would rather read accurate information in English unless absolutely required to read code. And even AI agents can easily go through the commit history and dig up any older design decisions and so on.
It also helps with traceability: say there is a production issue, which in turn gets tracked to a particular commit. We can inspect both the code and human/AI context at that point in time to figure out what went wrong, so that the engineering can perform at a higher level next time.
The Triple C System provides:
Beta Was this translation helpful? Give feedback.
All reactions