Skip to content

tools: add first draft of AGENTS.md (tested with Gemini, edited by Claude, and Codex)#26502

Open
spytheman wants to merge 22 commits intovlang:masterfrom
spytheman:add_agents_md
Open

tools: add first draft of AGENTS.md (tested with Gemini, edited by Claude, and Codex)#26502
spytheman wants to merge 22 commits intovlang:masterfrom
spytheman:add_agents_md

Conversation

@spytheman
Copy link
Member

No description provided.

@spytheman spytheman changed the title tools: add first draft of AGENTS.md (tested with Gemini) tools: add first draft of AGENTS.md (tested with Gemini, edited by Claude) Feb 2, 2026
spytheman and others added 8 commits February 2, 2026 15:04
…Variables sections

Adds three high-value sections to help AI agents work more effectively with the V compiler:
- Error Reporting: API for c.error(), c.warn(), c.note() in checker/parser
- Option/Result Types: Syntax, common bugs, test locations, and cgen pitfalls
- Environment Variables: VFLAGS, VAUTOFIX, VEXE, and V2-specific variables

These additions provide timeless guidance on compiler internals without time-specific references.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Split two lines exceeding 100 character limit to pass markdown linting.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Make it clear that check-md must be run before committing .md files:
- Added "(required for .md files before commits)" to Tools section
- Updated Gotchas to mention both fmt and check-md requirements

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ler models to benefit from the AGENTS.md file.
@spytheman spytheman changed the title tools: add first draft of AGENTS.md (tested with Gemini, edited by Claude) tools: add first draft of AGENTS.md (tested with Gemini, edited by Claude, and Codex) Feb 2, 2026
@spytheman
Copy link
Member Author

spytheman commented Feb 2, 2026

I think it works well enough as a first draft. codex works well with it, following the instructions to use v -o ./vnew cmd/v and then only the new compiler for checking stuff, and already managed to fix 2 bugs today with that workflow, without significant intervention or too verbose prompts.

See #26505 and #26508 .

@spytheman
Copy link
Member Author

@medvednikov what do you think?

@fleximus
Copy link
Member

fleximus commented Feb 2, 2026

Great job. I'd also add that it should generally avoid the unsafe keyword and give proper guidance when it's okay to be used.

@JalonSolov
Copy link
Contributor

Just hit this one... need a rule to tell the agent to put all v command line options between v and the rest of the command. Otherwise, it does what it just did to me...

v test vlib/<a module> -count 1

Which fails... will also help with v run, when the V options have to come between the v and the run, etc.

I wish V had always enforced this, but... we are where we are.

…nd before the subcommand/file. Edit the rest for clarity.
@kbkpbot
Copy link
Contributor

kbkpbot commented Feb 4, 2026

### Compiler Development
Backup working V binary: `cp v v_ok` first.
When fixing V compiler bugs, use the existing v_ok binary to avoid rebuilding:
```bash
# Copy v_ok to v, then rebuild with debug symbols
cp v_ok v && v self -g

This avoids rebuilding the old compiler from scratch each time.

@JalonSolov
Copy link
Contributor

The current rules say to build vnew and use it for everything after that. No need to copy the current v to v_ok then copy it back.

@kbkpbot
Copy link
Contributor

kbkpbot commented Feb 4, 2026

The current rules say to build vnew and use it for everything after that. No need to copy the current v to v_ok then copy it back.

I think some apps may hardcoded os.execute("v ...") in their code

@spytheman
Copy link
Member Author

spytheman commented Feb 4, 2026

@kbkpbot thanks. I was worried about ensuring self compilation at all times (as a kind of additional test, and to make sure the AI does not get deceived by a stale binary), but you are right, saving it may be faster in more cases, especially smaller fixes that will be for examples/tools, that do not need to change the compiler. I'll test changing the policy.

For the record, with the current one from this PR, from 2532e0f , that uses ./vnew, codex gpt-5.2-codex (reasoning xhigh, summaries auto) solved the isolated test in #26508 independently in 9m 38s , ~2m for the solution, and the rest spent in running tests:
image

The review after that, which also ran tests took 3m 42s (but this time without ./vnew test vlib/):
image

@ylluminate
Copy link
Contributor

Hey @spytheman, went through this pertty heavily with gpt-5.2-pro-extended and then did some more iterations on it and went back and forth with Claude Opus 4.5 several rounds. Also integrated points in the comments here that they believed were of value.

The AIs basically said it needed more information / guardrails, etc to be useful to themselves properly and here's what shook out (I shared part of this with @medvednikov already, but this is a bit more updated from what I shared when he asked me): AGENTS.md

@spytheman
Copy link
Member Author

@ylluminate , thank you 🙇🏻 .

I've compared the version from here and your V_AGENTS.md first by length:

0[18:52:24]@m1: (master) /opt/v $ wc PR_26502_AGENTS.md V_AGENTS.md |sort -n
     429    3230   21682 PR_26502_AGENTS.md
     713    4594   30956 V_AGENTS.md

and then with codex gpt-5.2-codex xhigh, and this prompt:

You are a top AI researcher. Do a robust and indepth assessment of V_AGENTS.md and PR_26502_AGENTS.md for the V programming language, based on how usable they are for
  a future AGENTS.md file, for AI models. Rate them from 0 to 100 (maximum).

The result:

Details ```md ─ Worked for 4m 25s ──────────────────────

• Overall Scores

  • V_AGENTS.md: 86/100
  • PR_26502_AGENTS.md: 73/100

Assessment basis: I read both files fully, cross-checked against CONTRIBUTING.md, TESTS.md, and doc/docs.md, and spot-validated commands locally.

Scoring Breakdown

Criterion Weight V_AGENTS.md PR_26502_AGENTS.md
Operational clarity for agents 25 23 19
Safety and guardrails 20 17 13
Technical correctness 20 16 17
Coverage of real V workflows 20 19 12
Maintainability and context efficiency 15 11 12
Total 100 86 73

Why V_AGENTS.md scores higher

  • Better onboarding and execution flow (V_AGENTS.md:37, V_AGENTS.md:199).
  • Much stronger scope/approval rules for AI behavior (V_AGENTS.md:91, V_AGENTS.md:109).
  • Better test selection granularity, including parser/checker/comptime triggers (V_AGENTS.md:501, V_AGENTS.md:503, V_AGENTS.md:464).
  • Stronger architecture/context sections useful for model planning (V_AGENTS.md:587, V_AGENTS.md:598, V_AGENTS.md:621).
  • Includes AI-specific failure prevention around comptime pitfalls (V_AGENTS.md:351).

Critical weaknesses in V_AGENTS.md (why not 90+)

  • One important factual inaccuracy: -printfn behavior is described incorrectly at V_AGENTS.md:571. In practice, -printfn prints selected function C to stdout; -o
    file.c still writes full C output file.
  • Some policy duplication creates instruction bloat and drift risk (713 lines / ~4.6k words).
  • Mixed messaging around self (V_AGENTS.md:148 vs V_AGENTS.md:263) is understandable but can still confuse weaker agents.
  • Recovery advice involving stash (V_AGENTS.md:154) is risky in multi-change worktrees unless tightly constrained.

Why PR_26502_AGENTS.md is decent but weaker

  • Stronger compactness and lower cognitive load (429 lines / ~3.2k words).
  • Core workflow and testing guidance is still practical (PR_26502_AGENTS.md:123, PR_26502_AGENTS.md:250).
  • -printfn explanation is closer to actual behavior (PR_26502_AGENTS.md:329).

Main gaps in PR_26502_AGENTS.md

  • Missing important sections for future-proof AI usability: no dedicated Safety section, no Modules/Imports section, no Compile-Time deep guidance, no Commits/PR
    policy.
  • Less explicit hierarchy and escalation structure than V_AGENTS.md.
  • Build guidance uses self path (PR_26502_AGENTS.md:92, PR_26502_AGENTS.md:177), which works, but is less aligned with make/bootstrap patterns centered on cmd/v.

Recommendation for a future AGENTS.md

  1. Use V_AGENTS.md as the base.
  2. Fix the -printfn statement (V_AGENTS.md:571) and tighten the self wording.
  3. Trim repeated rules by ~20-30% to improve model adherence and reduce drift.

@spytheman
Copy link
Member Author

  * `-printfn <n> -o file.c` emits only the named C function to the
    output file. The name uses the `modulename__fnname` format (e.g.
    `main__main`). This flag can be repeated to print multiple
    functions. Methods/generics may use more complex C names; use
    `-keepc` to confirm exact symbols.

this should become:

  * `-printfn <name> -o file.c` emits only the named C function to
    standard output. The `name` uses the `modulename__fnname` format (e.g.
    `main__main`). This flag can be repeated to print multiple
    functions. Methods/generics may use more complex C names; use
    `-keepc` to confirm exact symbols.

@JalonSolov
Copy link
Contributor

If we allow AI to create PRs, perhaps we should have a rule that the first comment must include "Created by ".

It would save time wondering if it was clever AI or a human that created it. Of course it's very obvious sometimes, but as the AI gets better...

@spytheman
Copy link
Member Author

spytheman commented Feb 4, 2026

I've repeated the same test with the modified version of V_AGENTS.md that @ylluminate linked. The results are better, (essentially the same bug fix, but discovered faster - Worked for 4m 30s, more specific tests were chosen).:
image

image

@spytheman
Copy link
Member Author

I am a bit concerned by the size of the AGENTS.md file. According to the models I've tried, anything above ~250 lines and ~10KB, may be a problem for smaller models 🤔 . codex itself with 5.2 has no issues with it.

@spytheman
Copy link
Member Author

spytheman commented Feb 4, 2026

I think some apps may hardcoded os.execute("v ...") in their code

I'll fix that separately. It is a problem indeed for some tools:

0[19:43:38]@m1: (add_agents_md) /opt/v $ rg os.execute..v
cmd/v2/test_ssa_backends.v:72:ref_res := os.execute('v -n -w -enable-globals run ${input_file}')
cmd/tools/git_pre_commit_hook.vsh:52:verify_result := os.execute('v fmt -verify ${vfiles.join(' ')}')
cmd/tools/vquest.v:287:res := os.execute('v missdoc --exclude vlib/v --exclude /linux_bare/ --exclude /wasm_bare/ @vlib')
cmd/tools/vreduce.v:358:os.execute('v fmt -w ${rpdc_file_path}')
cmd/tools/vreduce.v:436:os.execute('v fmt -w ${rpdc_file_path}')
cmd/tools/vreduce.v:448:os.execute('v fmt -w ${rpdc_file_path}')
vlib/v/checker/fn.v:1436:os.execute('v translate fndef ${name[2..]} tmp.c')
vlib/v/slow_tests/valgrind/valgrind_test.v:57:res_valgrind := os.execute('valgrind --version')
AGENTS.md:692:* Some V programs and tools hardcode `os.execute('v ...')` in their

edit: done in 11bf3dd .

@spytheman
Copy link
Member Author

If we allow AI to create PRs, perhaps we should have a rule that the first comment must include "Created by ".

It would save time wondering if it was clever AI or a human that created it. Of course it's very obvious sometimes, but as the AI gets better...

The ghostty project has https://github.com/ghostty-org/ghostty/blob/main/AI_POLICY.md as a separate file.

@ylluminate
Copy link
Contributor

ylluminate commented Feb 5, 2026

@spytheman in my experience the gpt-5.2-pro-extended is still a bit more intelligent than gpt-5.2-codex xhigh for things like this, so I'm always happy to run things through it if you or anyone like/need. Just let me know.

We are, very quickly, getting to the point where context windows will not be an issue any longer for things like this. I know I do not use any tools at all anymore that would present such a problem and any LLM that has a context window that can't work with this kind of size probably isn't something that can be counted on as reliable.

@ylluminate
Copy link
Contributor

ylluminate commented Feb 5, 2026

If we allow AI to create PRs, perhaps we should have a rule that the first comment must include "Created by ".
It would save time wondering if it was clever AI or a human that created it. Of course it's very obvious sometimes, but as the AI gets better...

The ghostty project has https://github.com/ghostty-org/ghostty/blob/main/AI_POLICY.md as a separate file.

Just so you know: you will never be able to enforce this. Making such a document is 100% wishful thinking. We have passed the threshold where this is actually discernable now for those that are adept at using such properly or invest time to curate the results properly when anything is slightly off. I have repeatedly seen that even the most advanced AI detection tools available at the highest price on the market now are beaten handily and rank AI generated content as "100% human"... 😆

Creating such a policy simply creates more headaches for the humans and is not dissimilar to having a robust CoC that creates pain and trouble.

If you really MUST have a policy, it should be simply this:

Assume everyone is using AI.

No one gets a reward for not using it. There's no prize. Just maybe an imaginary high five for being a smart person. Fabricated ephemeral pats on the back.

A side note: Crying and pointing "AI!!!!" is the "new thing" going forward. I'm actually friends with two people on both sides of this fence presently where one has been accused of genuine content being AI generated and another that is claiming it about another party. Now they are gearing up for court litigation... There is simply no way to prove things and there will never be a way to truly do it regardless of any amount of effort that goes into it. Intelligence is intelligence and whether or not we call it "artificial" doesn't matter since it is still probably going to be more intelligent than most who grace the surface of this world... Anyway, pontifications over, it's just the state we're in now.

@ylluminate
Copy link
Contributor

I did not do any significant work on these - but they should have things discussed thus far integrated as examples.

@spytheman
Copy link
Member Author

Just so you know: you will never be able to enforce this. Making such a document is 100% wishful thinking.

The point of such a policy file, is not to declare a stance, that AI is good, or that AI is bad. It is to create an easy way for me, or anyone in my place, even a bot eventually, to dismiss and close quickly really low effort PRs (slop), that did not bother to even format the files / run the basic tests, before submitting. AIs just increase the chance of those happening a lot .

@spytheman
Copy link
Member Author

spytheman commented Feb 5, 2026

Intelligence is intelligence and whether or not we call it "artificial" doesn't matter since it is still probably going to be more intelligent than most who grace the surface of this world...

I agree completely. Intent and free will however matters a lot, and unless AIs develop those qualities (and ethics ultimately), they will be tools that are used by an increasingly wide variety of people for all kinds of purposes. Most people like to help but are not good developers themselves. Some also just do random things for the lulz too.

What motivated me to create this PR, is in part the desire, to help AIs produce a bit higher quality results by default, regardless of the intent of the people driving the said AI models. I am well aware, that skillful users can already make very high quality PRs - I've seen them here, just like I have seen the pure garbage too.

@ylluminate
Copy link
Contributor

@spytheman you make a really good point about slop and I think we're actually saying the same thing from different angles. Your concern isn't really about AI vs human - it's about quality standards. And you're right that AI lowers the barrier for low-effort submissions. But here's the thing: the AGENTS.md file we've been working on together IS the answer to that problem, and it's a far better one than any policy document could be.

A policy file says "label yourself and we'll judge you." An AGENTS.md says "here's exactly how to do it right - no excuses." It raises the floor. If someone (human or AI) submits a PR that didn't run fmt, didn't run tests, didn't follow the workflow - the AGENTS.md made the expectations explicit and discoverable. The PR gets rejected on merit, not on provenance. That's cleaner and actually enforceable.

Your second comment really resonates - "to help AIs produce a bit higher quality results by default, regardless of the intent of the people driving the said AI models." That's exactly right. The AGENTS.md is a quality multiplier that works regardless of who or what is reading it. A policy file is a speed bump that only honest people stop for.

I think what you've built here with this PR is genuinely more valuable than any AI policy could be. You're not trying to gatekeep - you're trying to raise the bar... So yeah, that's the right move.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants